Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters
Abstract Background In research designs that rely on observational ratings provided by two raters, assessing inter-rater reliability (IRR) is a frequently required task. However, some studies fall short in properly utilizing statistical procedures, omitting essential information necessary for interp...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-08-01
|
Series: | BMC Cancer |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12885-023-11325-z |
_version_ | 1797452293894832128 |
---|---|
author | Ming Li Qian Gao Tianfei Yu |
author_facet | Ming Li Qian Gao Tianfei Yu |
author_sort | Ming Li |
collection | DOAJ |
description | Abstract Background In research designs that rely on observational ratings provided by two raters, assessing inter-rater reliability (IRR) is a frequently required task. However, some studies fall short in properly utilizing statistical procedures, omitting essential information necessary for interpreting their findings, or inadequately addressing the impact of IRR on subsequent analyses’ statistical power for hypothesis testing. Methods This article delves into the recent publication by Liu et al. in BMC Cancer, analyzing the controversy surrounding the Kappa statistic and methodological issues concerning the assessment of IRR. The primary focus is on the appropriate selection of Kappa statistics, as well as the computation, interpretation, and reporting of two frequently used IRR statistics when there are two raters involved. Results The Cohen’s Kappa statistic is typically utilized to assess the level of agreement between two raters when there are two categories or for unordered categorical variables with three or more categories. On the other hand, when it comes to evaluating the degree of agreement between two raters for ordered categorical variables comprising three or more categories, the weighted Kappa is a widely used measure. Conclusion Despite not substantially affecting the findings of Liu et al.?s study, the statistical dispute underscores the significance of employing suitable statistical methods. Rigorous and accurate statistical results are crucial for producing trustworthy research. |
first_indexed | 2024-03-09T15:06:36Z |
format | Article |
id | doaj.art-2c74cfce5546407e976e1dd6f837e88b |
institution | Directory Open Access Journal |
issn | 1471-2407 |
language | English |
last_indexed | 2024-03-09T15:06:36Z |
publishDate | 2023-08-01 |
publisher | BMC |
record_format | Article |
series | BMC Cancer |
spelling | doaj.art-2c74cfce5546407e976e1dd6f837e88b2023-11-26T13:35:52ZengBMCBMC Cancer1471-24072023-08-012311510.1186/s12885-023-11325-zKappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context mattersMing Li0Qian Gao1Tianfei Yu2Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar UniversityDepartment of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar UniversityDepartment of Biotechnology, College of Life Science and Agriculture Forestry, Qiqihar UniversityAbstract Background In research designs that rely on observational ratings provided by two raters, assessing inter-rater reliability (IRR) is a frequently required task. However, some studies fall short in properly utilizing statistical procedures, omitting essential information necessary for interpreting their findings, or inadequately addressing the impact of IRR on subsequent analyses’ statistical power for hypothesis testing. Methods This article delves into the recent publication by Liu et al. in BMC Cancer, analyzing the controversy surrounding the Kappa statistic and methodological issues concerning the assessment of IRR. The primary focus is on the appropriate selection of Kappa statistics, as well as the computation, interpretation, and reporting of two frequently used IRR statistics when there are two raters involved. Results The Cohen’s Kappa statistic is typically utilized to assess the level of agreement between two raters when there are two categories or for unordered categorical variables with three or more categories. On the other hand, when it comes to evaluating the degree of agreement between two raters for ordered categorical variables comprising three or more categories, the weighted Kappa is a widely used measure. Conclusion Despite not substantially affecting the findings of Liu et al.?s study, the statistical dispute underscores the significance of employing suitable statistical methods. Rigorous and accurate statistical results are crucial for producing trustworthy research.https://doi.org/10.1186/s12885-023-11325-zRECIST 1.1 criteriaLiver metastasesDWIIntra-rater reliabilityKappa statisticCohen’s Kappa |
spellingShingle | Ming Li Qian Gao Tianfei Yu Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters BMC Cancer RECIST 1.1 criteria Liver metastases DWI Intra-rater reliability Kappa statistic Cohen’s Kappa |
title | Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters |
title_full | Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters |
title_fullStr | Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters |
title_full_unstemmed | Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters |
title_short | Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters |
title_sort | kappa statistic considerations in evaluating inter rater reliability between two raters which when and context matters |
topic | RECIST 1.1 criteria Liver metastases DWI Intra-rater reliability Kappa statistic Cohen’s Kappa |
url | https://doi.org/10.1186/s12885-023-11325-z |
work_keys_str_mv | AT mingli kappastatisticconsiderationsinevaluatinginterraterreliabilitybetweentworaterswhichwhenandcontextmatters AT qiangao kappastatisticconsiderationsinevaluatinginterraterreliabilitybetweentworaterswhichwhenandcontextmatters AT tianfeiyu kappastatisticconsiderationsinevaluatinginterraterreliabilitybetweentworaterswhichwhenandcontextmatters |