Sensitivity and Specificity
How do we know if a test or classifier is reliable? In both medicine and machine learning, classifier are used to attribute a class to an input data. Therefore, performance metrics are necessary to assess the reliability of the resulting classification. They help answer questions like:
- How often does the test correctly identify the true class?
- How often does the test mistakenly identify the wrong class?
On the Origin of Sensitivity and Specificity
Originally, sensitivity and specificity were qualities of a biological test used in chemistry, serology and immunology.
- Sensitivity measured how reactive a test was e.i. ability to detect minute quantities of the target antigen.
- Specificity measured how selective a test was e.i. ability to react exclusively with the target antigen and not with unrelated molecules.
Later, in clinical epidemiology, Jacob Yerushalmy in 1947 introduced sensitivity and specificity as probabilities to assess reliability of diagnoses in radiology.
- A measure of sensitivity or the probability of correct diagnosis of positive cases
- A measure of specificity or the probability of correct diagnosis of negative cases
Today, sensitivy and specificity are used as metrics beyond medecine, in data science and maching learning. They evaluate the performance of binary classifiers.
Binary Classifier
One can consider the medical diagnoses as binary classifier. A classifier is a process that takes input data and assigns it to a class. Such classifier is binary when it can only choose between two possible classes.
For instance, the diagnosis based on X-ray data only produce two possible outputs: Positive
or Negative
.
input X-ray data | diagnosis |
---|---|
case 1 | Positive |
case 2 | Negative |
case 3 | Positive |
case 4 | Positive |
case 5 | Negative |
The Confusion Matrix
The confusion matrix is a table that compares classifier's prediction with the observed class.
Predicted Positive | Predicted Negative | |
---|---|---|
Observed Positive | ✅ True Positive (TP) | ❌ False Negative (FN) |
Observed Negative | ❌ False Positive (FP) | ✅ True Negative (TN) |
- ✅ True Positive (TP): The classifier correctly predicted a positive outcome, the observed outcome was positive.
- ✅ True Negative (TN): The classifier correctly predicted a negative outcome, the observed outcome was negative.
- ❌ False Positive (FP): The classifier incorrectly predicted a positive outcome, the observed outcome was negative. It is also known as a Type I error.
- ❌ False Negative (FN): The classifier incorrectly predicted a negative outcome, the observed outcome was positive. It is also known as a Type II error.
Sensitivity
The sensitivity is the ratio of observed positive
cases, correctly predicted by the test.
$$ \text{Sensitivity} = \frac{TP}{TP + FN} $$
Specificity
The specificity is the ratio of observed negative
cases, correctly predicted by the test.
$$ \text{Specificity} = \frac{TN}{TN + FP} $$
Type I Error
The Type I Error or False Positive Rate impacts the precision of a test. It is the ratio of observed negative
cases that the test failed to detect.
$$ \text{False Positive Rate} = \frac{FP}{FP + TN} $$
False Positive Rate can be expressed with specificity:
$$ \text{False Positive Rate} = {1 - Specificity} $$
Type II error
The Type II Error or False Negative Rate impacts the sensitivity of a test. It is the ratio of observed positive
cases that the test failed to detect.
$$ \text{False Negative Rate} = \frac{FN}{FN + TP} $$
False Negative Rate can be expressed with sensitivity:
$$ \text{False Negative Rate} = {1 - Sensitivity} $$
Accuracy
The accuracy is how often the test correctly predicted the outcome for both positive
and negative
cases.
$$ \text{Accuracy} = \frac{TP + TN }{TN + TN + FP + FN} $$
Accuracy can be expressed as the weighted mean of sensitivity and specificity, weighted by the number of observed positive
and negative
cases.
$$ \text{Accuracy} = \frac{P ⋅ Sensitivity + N ⋅ Specificity }{P + N} $$
References
Statistical Problems in Assessing Methods of Medical Diagnosis, with Special Reference to X-Ray Techniques
Jacob Yerushalmy
Public health reports. 1947 October 03. DOI: 10.2307/4586294
On the Origin of Sensitivity and Specificity
Nicholas Binney, Christopher Hyde, and Patrick M. Bossuyt
Annals of Internal Medecine. 2021 March 16. DOI: 10.7326/M20-5028
ORION : a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles
Yassine Ghouzam, Guillaume Postic, Pierre-Edouard Guerin, Alexandre G. de Brevern & Jean-Christophe Gelly
Scientific Reports. 2016 Jun 20. DOI: 10.1038/srep28268
Relevant Tags
About the Author
Latest Articles
-
Turing Complete: From Logical Gates to CPU Architecture
In 2021, LevelHead published Turing Complete, a game about computer science. My friend Christophe Georgescu recommended me to play it. Unfortunately, I took his advice and now I can not stop to play this game! The game challenges you to design an entire computer from scratch. You start with basic logic gates, then move on to components, memory, CPU architecture, and finally assembly programming. By the way, the game is neat and present all these concepts in a playful and intuitive way.SEP 2025 · PIERRE-EDOUARD GUERIN -
How to Manage a Project?
In any company, every task is part of a project. I am responsible for managing multiple projects each year. I have to present deliverables to stakeholders, meet deadlines, allocate mandays and coordinate everyone’s actions. This is a meticulous work that requires a strong methodology.JUN 2025 · PIERRE-EDOUARD GUERIN -
Gantt Chart Excel for Project Management
A Gantt chart visualizes the scheduling of tasks over the project timeline. My colleague Daphne Verdelet shared with me the Excel template her team at INRAE uses to keep track of their tasks over the year.MAY 2025 · PIERRE-EDOUARD GUERIN