Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters

Abstract Background In research designs that rely on observational ratings provided by two raters, assessing inter-rater reliability (IRR) is a frequently required task. However, some studies fall short in properly utilizing statistical procedures, omitting essential information necessary for interp...

Full description

Bibliographic Details
Main Authors: Ming Li, Qian Gao, Tianfei Yu
Format: Article
Language:English
Published: BMC 2023-08-01
Series:BMC Cancer
Subjects:
Online Access:https://doi.org/10.1186/s12885-023-11325-z
_version_ 1797452293894832128
author Ming Li
Qian Gao
Tianfei Yu
author_facet Ming Li
Qian Gao
Tianfei Yu
author_sort Ming Li
collection DOAJ
description Abstract Background In research designs that rely on observational ratings provided by two raters, assessing inter-rater reliability (IRR) is a frequently required task. However, some studies fall short in properly utilizing statistical procedures, omitting essential information necessary for interpreting their findings, or inadequately addressing the impact of IRR on subsequent analyses’ statistical power for hypothesis testing. Methods This article delves into the recent publication by Liu et al. in BMC Cancer, analyzing the controversy surrounding the Kappa statistic and methodological issues concerning the assessment of IRR. The primary focus is on the appropriate selection of Kappa statistics, as well as the computation, interpretation, and reporting of two frequently used IRR statistics when there are two raters involved. Results The Cohen’s Kappa statistic is typically utilized to assess the level of agreement between two raters when there are two categories or for unordered categorical variables with three or more categories. On the other hand, when it comes to evaluating the degree of agreement between two raters for ordered categorical variables comprising three or more categories, the weighted Kappa is a widely used measure. Conclusion Despite not substantially affecting the findings of Liu et al.?s study, the statistical dispute underscores the significance of employing suitable statistical methods. Rigorous and accurate statistical results are crucial for producing trustworthy research.
first_indexed 2024-03-09T15:06:36Z
format Article
id doaj.art-2c74cfce5546407e976e1dd6f837e88b
institution Directory Open Access Journal
issn 1471-2407
language English
last_indexed 2024-03-09T15:06:36Z
publishDate 2023-08-01
publisher BMC
record_format Article
series BMC Cancer
spelling doaj.art-2c74cfce5546407e976e1dd6f837e88b2023-11-26T13:35:52ZengBMCBMC Cancer1471-24072023-08-012311510.1186/s12885-023-11325-zKappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context mattersMing Li0Qian Gao1Tianfei Yu2Department of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar UniversityDepartment of Computer Science and Technology, College of Computer and Control Engineering, Qiqihar UniversityDepartment of Biotechnology, College of Life Science and Agriculture Forestry, Qiqihar UniversityAbstract Background In research designs that rely on observational ratings provided by two raters, assessing inter-rater reliability (IRR) is a frequently required task. However, some studies fall short in properly utilizing statistical procedures, omitting essential information necessary for interpreting their findings, or inadequately addressing the impact of IRR on subsequent analyses’ statistical power for hypothesis testing. Methods This article delves into the recent publication by Liu et al. in BMC Cancer, analyzing the controversy surrounding the Kappa statistic and methodological issues concerning the assessment of IRR. The primary focus is on the appropriate selection of Kappa statistics, as well as the computation, interpretation, and reporting of two frequently used IRR statistics when there are two raters involved. Results The Cohen’s Kappa statistic is typically utilized to assess the level of agreement between two raters when there are two categories or for unordered categorical variables with three or more categories. On the other hand, when it comes to evaluating the degree of agreement between two raters for ordered categorical variables comprising three or more categories, the weighted Kappa is a widely used measure. Conclusion Despite not substantially affecting the findings of Liu et al.?s study, the statistical dispute underscores the significance of employing suitable statistical methods. Rigorous and accurate statistical results are crucial for producing trustworthy research.https://doi.org/10.1186/s12885-023-11325-zRECIST 1.1 criteriaLiver metastasesDWIIntra-rater reliabilityKappa statisticCohen’s Kappa
spellingShingle Ming Li
Qian Gao
Tianfei Yu
Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters
BMC Cancer
RECIST 1.1 criteria
Liver metastases
DWI
Intra-rater reliability
Kappa statistic
Cohen’s Kappa
title Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters
title_full Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters
title_fullStr Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters
title_full_unstemmed Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters
title_short Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters
title_sort kappa statistic considerations in evaluating inter rater reliability between two raters which when and context matters
topic RECIST 1.1 criteria
Liver metastases
DWI
Intra-rater reliability
Kappa statistic
Cohen’s Kappa
url https://doi.org/10.1186/s12885-023-11325-z
work_keys_str_mv AT mingli kappastatisticconsiderationsinevaluatinginterraterreliabilitybetweentworaterswhichwhenandcontextmatters
AT qiangao kappastatisticconsiderationsinevaluatinginterraterreliabilitybetweentworaterswhichwhenandcontextmatters
AT tianfeiyu kappastatisticconsiderationsinevaluatinginterraterreliabilitybetweentworaterswhichwhenandcontextmatters