{"id":22758,"date":"2021-04-22T08:04:00","date_gmt":"2021-04-22T08:04:00","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/6-useful-metrics-evaluate-binary-classification-models\/"},"modified":"2023-08-24T13:43:03","modified_gmt":"2023-08-24T13:43:03","slug":"6-useful-metrics-evaluate-binary-classification-models","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/6-useful-metrics-evaluate-binary-classification-models\/","title":{"rendered":"6 Useful Metrics To Evaluate Binary Classification Models"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22758\" class=\"elementor elementor-22758\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-9a6fa82 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9a6fa82\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a3acdbf\" data-id=\"a3acdbf\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-71d6ad9 elementor-widget elementor-widget-heading\" data-id=\"71d6ad9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">This is Newt! And he has a binary classification problem<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-117a29d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"117a29d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-e0903b6\" data-id=\"e0903b6\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-172d9ad elementor-widget elementor-widget-text-editor\" data-id=\"172d9ad\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"aligncenter\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/taylor-nicole-uztru-lwze4-unsplash.jpg\" alt=\"6 Useful Metrics To Evaluate Binary Classification Models\" class=\"wp-image-1115\"\/><figcaption>Photo by&nbsp;<a href=\"https:\/\/unsplash.com\/@taynicole0630?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noreferrer noopener\">Taylor Nicole<\/a>&nbsp;on&nbsp;<a href=\"https:\/\/unsplash.com\/s\/photos\/boy?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noreferrer noopener\">Unsplash<\/a><\/figcaption><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-327018f elementor-widget elementor-widget-text-editor\" data-id=\"327018f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>This is Newt. Living in a mythical world, Newt aspires to become the best dragon trainer. But a dragon only accepts someone as his forever owner if his owner is the first creature it sees right after hatching. A bit like love at the first sight, coincidental yet so precious!<\/p>\n<p>That\u2019s why Newt has been searching high and low for hatchable dragon eggs. Unfortunately, a hatchable egg is really difficult to come by.<\/p>\n<p>What\u2019s more? Distinguishing between hatchable eggs and unhatchable ones is super tedious.&nbsp;<strong>Newt often spends hours after hours trying to examine dragon eggs having different shapes, coming from various species with distinct appearances together with god only knows how many more environmental factors that could make an egg less likely to hatch.<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4791bc9 elementor-widget elementor-widget-text-editor\" data-id=\"4791bc9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"aligncenter\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/eggs.png\" alt=\"Which eggs are likely to hatch\" class=\"wp-image-1116\"\/><figcaption>Image by Author<\/figcaption><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c30873f elementor-widget elementor-widget-text-editor\" data-id=\"c30873f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Poor Newt can\u2019t afford to hatch all the eggs he found because his incubator only has limited slots. Thus, he has to find a better way before driving himself nuts. After all, he wants to be a skillfull dragon trainer, not a professional egg analyst.<\/p>\n<p>On his way to the forest to search for dragon eggs, Newt bumped into Max, an avid tech-lover. Max shared his ideas on&nbsp;<strong>how to teach a computer to identify hatchable eggs based on egg images and the related environmental readings where the egg was found.<\/strong><\/p>\n<p>And boom! That\u2019s how Newt got started with machine learning. Specifically, Newt needs to train a classification model to identify hatchable eggs from unhatchable ones.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5fac988 elementor-widget elementor-widget-heading\" data-id=\"5fac988\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Spoilt for choices. Which one to choose?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e839e3d elementor-widget elementor-widget-text-editor\" data-id=\"e839e3d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>When it comes to classification models, Newt is spoilt for choices: Logistic regression, XGBoost Classifier, Random Forest Classifier, AdaBoost Classifer and so on.<\/p>\n<p>Even if Newt can shortlist to a single model, he also has to choose the best one among different variations as he tunes different hyperparameters (a.k.a. hyperparameter optimisation) or utilises different features (a.k.a. feature engineering).<\/p>\n<p>Simply put, among different model types, fine-tuned hyperparameters and features, Newt needs a quantifiable way to pick the best classification model. And that\u2019s what <a href=\"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/evaluation-metrics-for-classification\/\" target=\"_blank\" rel=\"noreferrer noopener\">evaluation metrics<\/a> are for.<\/p>\n<p>In the next sections, we will explore:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3c4a4a7 elementor-widget elementor-widget-text-editor\" data-id=\"3c4a4a7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>Confusion matrix: the basis of all metrics<\/li><li>Accuracy, precision, recall, F1 Score<\/li><li>ROC curve and ROC AUC<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d59b178 elementor-widget elementor-widget-heading\" data-id=\"d59b178\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Confusion matrix: The basis of all metrics<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8e0f78c elementor-widget elementor-widget-text-editor\" data-id=\"8e0f78c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"aligncenter\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/how-to-classify-your-dragon-eggs-2.png\" alt=\"6 Useful Metrics To Evaluate Binary Classification Models\" class=\"wp-image-1074\"\/><figcaption>Image by Author<\/figcaption><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1012a50 elementor-widget elementor-widget-text-editor\" data-id=\"1012a50\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>A confusion matrix just a way to record how many times the classification model correctly or incorrectly classify things into the corresponding buckets.<\/p>\n<p>For example, the model initially classified 10 eggs as hatchable. However, out of those 10 eggs, only 6 are hatchable while the remaining 4 are unhatchable. In this case, the True Positive (TP) is 6 while the False Positive (FP) is 4.<\/p>\n<p>Similarly, if the model classified 10 eggs as unhatchable. Out of which, 7 is actually unhatchable while the remaining 3 can hatch. We say the True Negative (TN) is 7 while False Negative (FN) is 3.<\/p>\n<p>You might also have already heard about type I and type II error in statistical hypothesis testing. Simply put, False Positive is a Type I error while False Negative is a Type II error.<\/p>\n<p>The most important takeaway here is that&nbsp;<strong>False Positive and False Negative imply two different impacts.<\/strong><\/p>\n<p>For instance,&nbsp;<strong>Newt would be wasting time and limited slots in his incubator to care for too many unhatchable eggs if the model results in too many False Positive<\/strong>. On the flip side,&nbsp;<strong>if there are too many False Negative, Newt would be wasting a lot of hatchable dragon eggs because he won\u2019t incubate those that the model has wrongly classified as unhatchable<\/strong>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f80b155 elementor-widget elementor-widget-heading\" data-id=\"f80b155\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Accuracy, recall, precision and F1 score<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e80a9bd elementor-widget elementor-widget-text-editor\" data-id=\"e80a9bd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The absolute count across 4 quadrants of the confusion matrix can make it challenging for an average Newt to compare between different models. Therefore, people often summarise the confusion matrix into the below metrics: accuracy, recall, precision and F1 score.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1e6902f elementor-widget elementor-widget-text-editor\" data-id=\"1e6902f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"aligncenter\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/how-to-classify-your-dragon-eggs-1-1.png\" alt=\"Accuracy, Recall, Precision &amp; F1 Score\" class=\"wp-image-1101\"\/><figcaption>Image by Author<\/figcaption><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9b2256a elementor-widget elementor-widget-text-editor\" data-id=\"9b2256a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In a typical ML project, these counting and calculations are already automated. Hence, you can easily retrieve these predefined values with&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/scikit-learn.org\/stable\/modules\/model_evaluation.html\" target=\"_blank\">scikit-learn.metrics<\/a>,&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/metrics\" target=\"_blank\">tf.keras.metrics<\/a>&nbsp;and so on.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3989b7a elementor-widget elementor-widget-text-editor\" data-id=\"3989b7a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>However, not understanding how the count is distributed across 4 quadrants of the confusion matrix and blindly relying on a single metrics could be a risky move. Below is an overview of each metric and where it falls short.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e193c81 elementor-widget elementor-widget-heading\" data-id=\"e193c81\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Accuracy<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-005a4e0 elementor-widget elementor-widget-text-editor\" data-id=\"005a4e0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Accuracy is probably the most intuitive metric to understand because it focuses on how often the prediction aligns with reality (i.e. True Positive and True Negative). So a model with 0.99 accuracy seems to be way better than our current model with 0.75 accuracy, right?<\/p>\n<p>Not so fast!<\/p>\n<p><strong>High accuracy can be misleading because it does not illustrate how True Positive and True Negative distributes.&nbsp;<\/strong>For example, I can simply classify all eggs as unhatchable to obtain the below confusion matrix together with a model boasting 99% accuracy.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a7c5295 elementor-widget elementor-widget-text-editor\" data-id=\"a7c5295\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"aligncenter\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/high-accuracy.png\" alt=\"6 Useful Metrics To Evaluate Binary Classification Models\" class=\"wp-image-1082\"\/><figcaption>Image by Author<\/figcaption><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-23b9acd elementor-widget elementor-widget-text-editor\" data-id=\"23b9acd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Pretty sure Newt will scream his lungs out because the model is clearly useless in helping him find hatchable eggs since all are labelled as unhatchable anyway.<\/p>\n<p>Even if predictions are spread out between hatchable and unhatchable, there is still another issue.&nbsp;<strong>Accuracy doesn\u2019t tell Newt what types of errors the classification model is making.<\/strong><\/p>\n<p>Remember how I said earlier that different errors mean different impacts for Newt? Ignoring False Positive and False Negative completely means Newt could end up with a model that wastes his precious time, incubation slots or dragon eggs.<\/p>\n<p>Luckily, precision and recall are two metrics that consider False Positive and False Negative. Say hello to precision and recall!<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7ecac79 elementor-widget elementor-widget-heading\" data-id=\"7ecac79\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Precision &amp; Recall<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b429fe5 elementor-widget elementor-widget-text-editor\" data-id=\"b429fe5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>If hatchable eggs are what Newt focuses on, precision aims to answer one question: Consider all eggs that are classified as hatchable by the model (TP+ FP), how many of them actually can be hatched into dragons (TP)?<\/p>\n<p>On the other hand, recall (also known as sensitivity) focuses on a very different angle of the problem: Among all eggs that can be hatched into dragons (TP + FN), how many of them can be spotted by the model (TP)?<\/p>\n<p><strong>Both precision and recall range from 0 to 1. As a general rule of thumb, the closer to 1, the better the model is.<\/strong>&nbsp;Unfortunately, you can\u2019t have the best of both worlds because&nbsp;<strong>increasing precision would cause recall to drop and vice versa<\/strong>. The image below illustrates this&nbsp;<strong><em>precision-recall trade-off.<\/em><\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-99bbca9 elementor-widget elementor-widget-text-editor\" data-id=\"99bbca9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"aligncenter\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/precision-recall-tradeoff.png\" alt=\"Precision-Recall Trade-off\" class=\"wp-image-1094\"\/><figcaption>Image by Author<\/figcaption><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-50bab86 elementor-widget elementor-widget-text-editor\" data-id=\"50bab86\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Here is a simple way to imagine what\u2019s going on between precision and recall.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-81e45db elementor-widget elementor-widget-text-editor\" data-id=\"81e45db\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol><li>If we classify all eggs as hatchable (i.e. all positive), then FN = 0 while FP increases significantly. Consequently, recall is now 1 while precision would drop closer to 0.<\/li><li>If we classify all eggs as unhatchable (i.e. all negative), then FP = 0 whereas FN rises drastically. This means precision is now 1, whereas recall would decline closer to 0.<\/li><\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-67a0fec elementor-widget elementor-widget-text-editor\" data-id=\"67a0fec\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Given the tradeoffs between precision and recall, how should Newt choose the most optimal classification model? Well,&nbsp;<strong>Newt would have to ask himself whether reducing False Negative is more or less important than minimising False Positive.<\/strong><\/p>\n<p>Remember I said earlier that False Positive and False Negative means different impacts? Specifically, Newt would have to make a conscious choice between wasting hatchable dragon eggs (reducing False Negative and favours high precision) or wasting time and incubation slots (minimising False Positive and favours high recall).<\/p>\n<p>Since life is precious and dragon eggs are so difficult to come by, a dedicated dragon lover like Newt could be more willing to choose a model having high recall with low precision.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-616f6ff elementor-widget elementor-widget-heading\" data-id=\"616f6ff\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">F1 Score<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eb95097 elementor-widget elementor-widget-text-editor\" data-id=\"eb95097\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>But what if our scenario indicates both precision and recall are essential? Well, that\u2019s when F1 Score comes into the picture.<\/p>\n<p>F1 Score is often called the harmonic mean of the model\u2019s precision and recall. Similar to recall and precision, the closer it is to 1, the better the model is.<\/p>\n<p>This metric is often&nbsp;<strong>useful for evaluating classification models when neither precision nor recall is clearly more important.<\/strong><\/p>\n<p>In real-life datasets, the data can be imbalanced, with one classification appears much more often than another. For example, fraud cases could be rarer than normal transactions.&nbsp;<strong>F1 Score would also come in handy to evaluate classification models for such imbalanced datasets.<\/strong><\/p>\n<p>And this also concludes our section about 4 basic metrics based on the almighty confusion matrix. In the next section, let\u2019s take it up a notch with Receiver Operating Characteristic (ROC) curve.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-befb0fd elementor-widget elementor-widget-heading\" data-id=\"befb0fd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">ROC curve and ROC AUC<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0d6a234 elementor-widget elementor-widget-heading\" data-id=\"0d6a234\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">ROC curve<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f69c8e0 elementor-widget elementor-widget-text-editor\" data-id=\"f69c8e0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>I find it somewhat interesting to cover what ROC stands for. So here is how the story went! During World War II, the US army wanted to improve the ability to detect enemy objects on battlefields. Among all initiatives, the ROC curve was developed to measure the ability of a radar receiver operator to correctly identify Japanese aircraft based on radar signal.<\/p>\n<p>Fast forward to modern days, the ROC curve has been used in various industries such as medicine, radiology, meteorology as well as machine learning. Nevertheless, people still refer to its original name: Receiver Operating Characteristic (ROC) curve.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c5dab77 elementor-widget elementor-widget-text-editor\" data-id=\"c5dab77\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"aligncenter\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/roc-curve.png\" alt=\"6 Useful Metrics To Evaluate Binary Classification Models\" class=\"wp-image-1105\"\/><figcaption>Image by Author<\/figcaption><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f9457dd elementor-widget elementor-widget-text-editor\" data-id=\"f9457dd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Let\u2019s take a look at the ROC curve shown above. I know the name and the look of the graph may sound a bit intimidating. But at its core, below are 4 key points you need to know.<\/p>\n<p>Firstly,&nbsp;<strong>an ROC curve is a graph showing the performance of a classification model across all decision thresholds.<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1829399 elementor-widget elementor-widget-text-editor\" data-id=\"1829399\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>Generally, the closer the ROC curve is to the upper left corner, the better performance the model has.<\/li><li>At the bare minimum, the ROC curve of a model has to be above the black dotted line (which shows the model at least performs better than a random guess).<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2f384b4 elementor-widget elementor-widget-text-editor\" data-id=\"2f384b4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Secondly, the performance of the model is measured by 2 parameters:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3910dc5 elementor-widget elementor-widget-text-editor\" data-id=\"3910dc5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>True Positive (TP) rate: a.k.a. recall<\/li><li>False Positive (FP) rate: a.k.a. probability of a false alarm<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-deb0c5b elementor-widget elementor-widget-text-editor\" data-id=\"deb0c5b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Thirdly,&nbsp;<strong>a decision threshold represents a value to convert a predicted probability into a class label.&nbsp;<\/strong>For example, let say Newt choose a threshold of 0.6 for hatchable eggs. If the model calculates the probability of an egg being hatchable is greater than or equal to 0.6, that egg will be classified as hatchable. Vice versa, if the probability is below 0.6, that egg is classified as unhatchable.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-77cca7a elementor-widget elementor-widget-text-editor\" data-id=\"77cca7a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Finally,&nbsp;<strong>as we choose a lower threshold, more items will be classified as positive. This leads to more TP and FP, thus boosting the TP rate and FP rate accordingly.<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6ce08ab elementor-widget elementor-widget-text-editor\" data-id=\"6ce08ab\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Apart from visualising model performance, the ROC curve also illustrates a crucial point:&nbsp;<strong>Determining the ideal threshold requires trade-offs between TP rate and FP rate in a way that makes sense for your business objectives.<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6f496b4 elementor-widget elementor-widget-text-editor\" data-id=\"6f496b4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In this case, if Newt chooses too high a threshold, he might be wasting a lot of dragon eggs because most are wrongly classified as unhatchable. On the flip side, a low threshold could see him spending months incubating so many eggs but never reap any rewards.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d8ded95 elementor-widget elementor-widget-heading\" data-id=\"d8ded95\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">ROC AUC<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c5c4471 elementor-widget elementor-widget-text-editor\" data-id=\"c5c4471\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"aligncenter\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/roc-auc.png\" alt=\"ROC AUC\" class=\"wp-image-1110\"\/><figcaption>Image by Author<\/figcaption><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-39fa145 elementor-widget elementor-widget-text-editor\" data-id=\"39fa145\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Looking at the picture above, it\u2019s relatively easy to see the blue curve is above the yellow curve, indicating better performance. But what if we have a few more curves representing different models?<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c08f11d elementor-widget elementor-widget-text-editor\" data-id=\"c08f11d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Well, the diagram could become too cluttered for anyone to decipher which is which. What\u2019s more? For many people, it could be much simpler to look at a numeric value instead of comparing the curves.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3aa0f98 elementor-widget elementor-widget-text-editor\" data-id=\"3aa0f98\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>That\u2019s where AUC, which stands for Area Under the Curve, would come in handy. Ranging from 0 to 1, AUC measures the entire two-dimensional area underneath the entire ROC curve.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ed5972a elementor-widget elementor-widget-text-editor\" data-id=\"ed5972a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Without getting too nerdy on the mathematics, here is what you need to know:&nbsp;<strong>the higher the AUC value, the better the model performs at classification.<\/strong>&nbsp;<strong>At the very least, a model\u2019s AUC has to be greater than 0.5 since it has to perform better than the random guess.<\/strong>&nbsp;Else, why should we waste time with machine learning anyway?<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-958fc54 elementor-widget elementor-widget-heading\" data-id=\"958fc54\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Parting Thoughts<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4b10105 elementor-widget elementor-widget-text-editor\" data-id=\"4b10105\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>So there you have it! No more confusion about what confusion matrix is and which evaluation metrics you should focus on for your next binary classification challenge.&nbsp;<strong>I can\u2019t stress enough how important it is to pick the right metrics that make the most sense to your business objectives. Otherwise, you could end up choosing a model that appears to be the best, yet landing you in hot water shortly after.<\/strong><\/p>\n<p>That\u2019s all I have for this blog post. Let\u2019s say bye to Newt for now and wish him luck on his quest to become the best dragon trainer in the world!<\/p>\n<p>Thank you for reading. Have feedback on how I can do better or just wanna chat? Let me know in the comments or find me on&nbsp;<a href=\"https:\/\/www.linkedin.com\/in\/skyetran\/\" target=\"_blank\" rel=\"noreferrer noopener\">LinkedIn<\/a>. Have a good one, ladies and gents!<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b793b8f elementor-widget elementor-widget-heading\" data-id=\"b793b8f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">References<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7fd2ea7 elementor-widget elementor-widget-text-editor\" data-id=\"7fd2ea7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol><li><a rel=\"noreferrer noopener\" href=\"https:\/\/developers.google.com\/machine-learning\/crash-course\/classification\/roc-and-auc\" target=\"_blank\">Classification: ROC Curve and AUC<\/a><\/li><li><a rel=\"noreferrer noopener\" href=\"https:\/\/learning.oreilly.com\/library\/view\/hands-on-machine-learning\/9781492032632\/\" target=\"_blank\">Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow<\/a><\/li><li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/model_evaluation.html\" target=\"_blank\" rel=\"noreferrer noopener\">Scikit-learn Metrics &amp; Scoring<\/a><\/li><\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>No more confusion about what confusion matrix is and which evaluation metrics you should focus on for your next binary classification challenge.<\/p>\n","protected":false},"author":1115,"featured_media":23571,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[1519,1072,1520,670],"ppma_author":[3849],"class_list":["post-22758","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-accuracy","tag-binary-classification","tag-confusion-matrix","tag-models"],"authors":[{"term_id":3849,"user_id":1115,"is_guest":0,"slug":"skye","display_name":"Nguyen Huong (Skye) Tran","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Nguyen-Huong-Skye-Tran-150x150.jpeg","user_url":"https:\/\/thedigitalskye.com\/","last_name":"Huong (Skye) Tran","first_name":"Nguyen","job_title":"","description":"Nguyen Huong (Skye) Tran is a Technical Writer at iQ Consult Pty Ltd, and an IT consultant."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22758","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1115"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22758"}],"version-history":[{"count":5,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22758\/revisions"}],"predecessor-version":[{"id":31462,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22758\/revisions\/31462"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/23571"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22758"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22758"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22758"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22758"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}