Sensitivity and Specificity
How do we know if a test or classifier is reliable? In both medicine and machine learning, classifier are used to attribute a class to an input data. Therefore, performance metrics are necessary to assess the reliability of the resulting classification. They help answer questions like:
- How often does the test correctly identify the true class?
- How often does the test mistakenly identify the wrong class?
On the Origin of Sensitivity and Specificity
Originally, sensitivity and specificity were qualities of a biological test used in chemistry, serology and immunology.
- Sensitivity measured how reactive a test was e.i. ability to detect minute quantities of the target antigen.
- Specificity measured how selective a test was e.i. ability to react exclusively with the target antigen and not with unrelated molecules.
Later, in clinical epidemiology, Jacob Yerushalmy in 1947 introduced sensitivity and specificity as probabilities to assess reliability of diagnoses in radiology.
- A measure of sensitivity or the probability of correct diagnosis of positive cases
- A measure of specificity or the probability of correct diagnosis of negative cases
Today, sensitivy and specificity are used as metrics beyond medecine, in data science and maching learning. They evaluate the performance of binary classifiers.
Binary Classifier
One can consider the medical diagnoses as binary classifier. A classifier is a process that takes input data and assigns it to a class. Such classifier is binary when it can only choose between two possible classes.
For instance, the diagnosis based on X-ray data only produce two possible outputs: Positive
or Negative
.
input X-ray data | diagnosis |
---|---|
case 1 | Positive |
case 2 | Negative |
case 3 | Positive |
case 4 | Positive |
case 5 | Negative |
The Confusion Matrix
The confusion matrix is a table that compares classifier's prediction with the observed class.
Predicted Positive | Predicted Negative | |
---|---|---|
Observed Positive | ✅ True Positive (TP) | ❌ False Negative (FN) |
Observed Negative | ❌ False Positive (FP) | ✅ True Negative (TN) |
- ✅ True Positive (TP): The classifier correctly predicted a positive outcome, the observed outcome was positive.
- ✅ True Negative (TN): The classifier correctly predicted a negative outcome, the observed outcome was negative.
- ❌ False Positive (FP): The classifier incorrectly predicted a positive outcome, the observed outcome was negative. It is also known as a Type I error.
- ❌ False Negative (FN): The classifier incorrectly predicted a negative outcome, the observed outcome was positive. It is also known as a Type II error.
Sensitivity
The sensitivity is the ratio of observed positive
cases, correctly predicted by the test.
$$ \text{Sensitivity} = \frac{TP}{TP + FN} $$
Specificity
The specificity is the ratio of observed negative
cases, correctly predicted by the test.
$$ \text{Specificity} = \frac{TN}{TN + FP} $$
Type I Error
The Type I Error or False Positive Rate impacts the precision of a test. It is the ratio of observed negative
cases that the test failed to detect.
$$ \text{False Positive Rate} = \frac{FP}{FP + TN} $$
False Positive Rate can be expressed with specificity:
$$ \text{False Positive Rate} = {1 - Specificity} $$
Type II error
The Type II Error or False Negative Rate impacts the sensitivity of a test. It is the ratio of observed positive
cases that the test failed to detect.
$$ \text{False Negative Rate} = \frac{FN}{FN + TP} $$
False Negative Rate can be expressed with sensitivity:
$$ \text{False Negative Rate} = {1 - Sensitivity} $$
Accuracy
The accuracy is how often the test correctly predicted the outcome for both positive
and negative
cases.
$$ \text{Accuracy} = \frac{TP + TN }{TN + TN + FP + FN} $$
Accuracy can be expressed as the weighted mean of sensitivity and specificity, weighted by the number of observed positive
and negative
cases.
$$ \text{Accuracy} = \frac{P ⋅ Sensitivity + N ⋅ Specificity }{P + N} $$
References
Statistical Problems in Assessing Methods of Medical Diagnosis, with Special Reference to X-Ray Techniques
Jacob Yerushalmy
Public health reports. 1947 October 03. DOI: 10.2307/4586294
On the Origin of Sensitivity and Specificity
Nicholas Binney, Christopher Hyde, and Patrick M. Bossuyt
Annals of Internal Medecine. 2021 March 16. DOI: 10.7326/M20-5028
ORION : a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles
Yassine Ghouzam, Guillaume Postic, Pierre-Edouard Guerin, Alexandre G. de Brevern & Jean-Christophe Gelly
Scientific Reports. 2016 Jun 20. DOI: 10.1038/srep28268
Relevant Tags
About the Author
Latest Articles
-
Chado: the GMOD Database Schema
the Generic Model Organism Database project or GMOD is a collection of open source software tools for managing, visualising, storing, and disseminating genetic and genomic data.JAN 2025 · PIERRE-EDOUARD GUERIN -
Error Messages
I am an anxious person. So error messages always makes my heart beat faster. Hopefully, following the Pareto Principle, 80% of error messages are mild while 20% are the really tough one. The point is to solve the first kind as quickly as possible and effortless. To do so, allow the user to solve the issue by himself with clear messages and hints (in the case of errors related to input files or parameters). Clear presentation of the context and precise localization of the error in the code will save a lot of useless and tedious work to the developer. The time spared on the easy errors just by having better messages, then can be reallocated to the second kind of errors, the troublemakers.NOV 2024 · PIERRE-EDOUARD GUERIN -
Generative AI
I was fortunate to follow the course of Sven Warris about software tools to integrate genAI into your own work and applications. The course is aimed at data scientists and bioinformaticians.MAY 2024 · PIERRE-EDOUARD GUERIN