The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment

To assess the quality of a binary classification, researchers often take advantage of a four-entry contingency table called <italic>confusion matrix</italic>, containing true positives, true negatives, false positives, and false negatives. To recap the four values of a confusion matrix i...

Full description

Bibliographic Details
Main Authors: Davide Chicco, Valery Starovoitov, Giuseppe Jurman
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9385097/
_version_ 1828112093473144832
author Davide Chicco
Valery Starovoitov
Giuseppe Jurman
author_facet Davide Chicco
Valery Starovoitov
Giuseppe Jurman
author_sort Davide Chicco
collection DOAJ
description To assess the quality of a binary classification, researchers often take advantage of a four-entry contingency table called <italic>confusion matrix</italic>, containing true positives, true negatives, false positives, and false negatives. To recap the four values of a confusion matrix in a unique score, researchers and statisticians have developed several rates and metrics. In the past, several scientific studies already showed why the Matthews correlation coefficient (MCC) is more informative and trustworthy than confusion-entropy error, accuracy, F<sub>1</sub> score, bookmaker informedness, markedness, and balanced accuracy. In this study, we compare the MCC with the diagnostic odds ratio (DOR), a statistical rate employed sometimes in biomedical sciences. After examining the properties of the MCC and of the DOR, we describe the relationships between them, by also taking advantage of an innovative geometrical plot called <italic>confusion tetrahedron</italic>, presented here for the first time. We then report some use cases where the MCC and the DOR produce discordant outcomes, and explain why the Matthews correlation coefficient is more informative and reliable between the two. Our results can have a strong impact in computer science and statistics, because they clearly explain why the trustworthiness of the information provided by the Matthews correlation coefficient is higher than the one generated by the diagnostic odds ratio.
first_indexed 2024-04-11T11:45:10Z
format Article
id doaj.art-22f242c019044bd58466d5cc60b87d35
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T11:45:10Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-22f242c019044bd58466d5cc60b87d352022-12-22T04:25:38ZengIEEEIEEE Access2169-35362021-01-019471124712410.1109/ACCESS.2021.30686149385097The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification AssessmentDavide Chicco0https://orcid.org/0000-0001-9655-7142Valery Starovoitov1https://orcid.org/0000-0001-7190-761XGiuseppe Jurman2https://orcid.org/0000-0002-2705-5728Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, ON, CanadaNational Academy of Sciences of Belarus, Minsk, BelarusFondazione Bruno Kessler, Trento, ItalyTo assess the quality of a binary classification, researchers often take advantage of a four-entry contingency table called <italic>confusion matrix</italic>, containing true positives, true negatives, false positives, and false negatives. To recap the four values of a confusion matrix in a unique score, researchers and statisticians have developed several rates and metrics. In the past, several scientific studies already showed why the Matthews correlation coefficient (MCC) is more informative and trustworthy than confusion-entropy error, accuracy, F<sub>1</sub> score, bookmaker informedness, markedness, and balanced accuracy. In this study, we compare the MCC with the diagnostic odds ratio (DOR), a statistical rate employed sometimes in biomedical sciences. After examining the properties of the MCC and of the DOR, we describe the relationships between them, by also taking advantage of an innovative geometrical plot called <italic>confusion tetrahedron</italic>, presented here for the first time. We then report some use cases where the MCC and the DOR produce discordant outcomes, and explain why the Matthews correlation coefficient is more informative and reliable between the two. Our results can have a strong impact in computer science and statistics, because they clearly explain why the trustworthiness of the information provided by the Matthews correlation coefficient is higher than the one generated by the diagnostic odds ratio.https://ieeexplore.ieee.org/document/9385097/Matthews correlation coefficientdiagnostic odds ratiobinary classificationconfusion matrixsupervised machine learningconfusion tetrahedron
spellingShingle Davide Chicco
Valery Starovoitov
Giuseppe Jurman
The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment
IEEE Access
Matthews correlation coefficient
diagnostic odds ratio
binary classification
confusion matrix
supervised machine learning
confusion tetrahedron
title The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment
title_full The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment
title_fullStr The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment
title_full_unstemmed The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment
title_short The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment
title_sort benefits of the matthews correlation coefficient mcc over the diagnostic odds ratio dor in binary classification assessment
topic Matthews correlation coefficient
diagnostic odds ratio
binary classification
confusion matrix
supervised machine learning
confusion tetrahedron
url https://ieeexplore.ieee.org/document/9385097/
work_keys_str_mv AT davidechicco thebenefitsofthematthewscorrelationcoefficientmccoverthediagnosticoddsratiodorinbinaryclassificationassessment
AT valerystarovoitov thebenefitsofthematthewscorrelationcoefficientmccoverthediagnosticoddsratiodorinbinaryclassificationassessment
AT giuseppejurman thebenefitsofthematthewscorrelationcoefficientmccoverthediagnosticoddsratiodorinbinaryclassificationassessment
AT davidechicco benefitsofthematthewscorrelationcoefficientmccoverthediagnosticoddsratiodorinbinaryclassificationassessment
AT valerystarovoitov benefitsofthematthewscorrelationcoefficientmccoverthediagnosticoddsratiodorinbinaryclassificationassessment
AT giuseppejurman benefitsofthematthewscorrelationcoefficientmccoverthediagnosticoddsratiodorinbinaryclassificationassessment