Table of contents
Upskilling Made Easy.
Understanding Classification Model Metrics: Precision, Recall, F1, F2, Accuracy, ROC, and AUC
Published 13 May 2025
1.8K+
5 sec read
Evaluating the performance of classification models is crucial to understanding how well they can make predictions. Various metrics provide insights into different aspects of model performance, enabling data scientists to choose the best model for their specific needs. This blog will explore essential classification metrics, including precision, recall, F1 score, F2 score, accuracy, ROC curve, and AUC (Area Under the Curve).
Definition: Accuracy is the simplest metric, representing the proportion of correct predictions out of all predictions made. It’s calculated as:
Accuracy = TP + TN / TP + TN + FP + FN
Where:
Limitations: Accuracy can be misleading in cases of class imbalance, where one class is more prevalent than the other.
Definition: Precision measures the accuracy of the positive predictions made by the model. It tells us how many of the predicted positive cases were actually positive.
Precision = TP / TP + FP
Example: If a model predicts 10 clients as likely to purchase a product and only 7 actually do, the precision is 0.7 or 70%.
Definition: Recall measures the ability of a model to find all the relevant cases (true positives). It answers the question, "Of all actual positive cases, how many did we capture?"
Recall = TP / TP + FN
Example: If there are 100 actual positive cases and the model correctly identifies 80, the recall is 0.8 or 80%.
Definition: The F1 score is the harmonic mean of precision and recall, serving as a balance between the two metrics. It is useful when you need a balance between precision and recall.
F1 Score = 2 Precision Recall / Precision + Recall
Example: If precision is 0.6 and recall is 0.8, the F1 score would be calculated as follows.
F1 Score = 2 0.6 0.8 / 0.6 + 0.8 = 2 * 0.481.4 approx 0.6857
Definition: Similar to the F1 score, the F2 score weighs recall higher than precision. It scores higher penalties for false negatives than for false positives, making it useful in medical and safety-critical applications.
F2 Score = (1 + 2^2) Precision Recall / (2^2 * Precision) + Recall
Definition: The Receiver Operating Characteristic (ROC) curve is a graphical representation that illustrates the diagnostic ability of a binary classifier as its discrimination threshold is varied. The curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR).
Definition: AUC is a single scalar value that summarizes the performance of a classifier across all thresholds. An AUC of 1.0 indicates perfect classification, while an AUC of 0.5 suggests no discriminative power.
Definition: A confusion matrix is a table that is often used to describe the performance of a classification model. It presents the counts of True Positives, True Negatives, False Positives, and False Negatives.