Achieving high inter-rater reliability in establishing data labels: a retrospective chart review study

Background In medical research, the effectiveness of machine learning algorithms depends heavily on the accuracy of labeled data. This study aimed to assess inter-rater reliability (IRR) in a retrospective electronic medical chart review to create high quality labeled data on comorbidities and adver...

Full description

Bibliographic Details
Main Authors: Hude Quan, Danielle A Southern, Yuan Xu, Cathy Eastwood, Guosong Wu, Natalie Sapiro, Cheligeer Cheligeer
Format: Article
Language:English
Published: BMJ Publishing Group 2024-04-01
Series:BMJ Open Quality
Online Access:https://bmjopenquality.bmj.com/content/13/2/e002722.full
_version_ 1797202415577989120
author Hude Quan
Danielle A Southern
Yuan Xu
Cathy Eastwood
Guosong Wu
Natalie Sapiro
Cheligeer Cheligeer
author_facet Hude Quan
Danielle A Southern
Yuan Xu
Cathy Eastwood
Guosong Wu
Natalie Sapiro
Cheligeer Cheligeer
author_sort Hude Quan
collection DOAJ
description Background In medical research, the effectiveness of machine learning algorithms depends heavily on the accuracy of labeled data. This study aimed to assess inter-rater reliability (IRR) in a retrospective electronic medical chart review to create high quality labeled data on comorbidities and adverse events (AEs).Methods Six registered nurses with diverse clinical backgrounds reviewed patient charts, extracted data on 20 predefined comorbidities and 18 AEs. All reviewers underwent four iterative rounds of training aimed to enhance accuracy and foster consensus. Periodic monitoring was conducted at the beginning, middle, and end of the testing phase to ensure data quality. Weighted Kappa coefficients were calculated with their associated 95% confidence intervals (CIs).Results Seventy patient charts were reviewed. The overall agreement, measured by Conger's Kappa, was 0.80 (95% CI: 0.78-0.82). IRR scores remained consistently high (ranging from 0.70 to 0.87) throughout each phase.Conclusion Our study suggests the detailed manual for chart review and structured training regimen resulted in a consistently high level of agreement among our reviewers during the chart review process. This establishes a robust foundation for generating high-quality labeled data, thereby enhancing the potential for developing accurate machine learning algorithms.
first_indexed 2024-04-24T08:03:05Z
format Article
id doaj.art-b62f38fc2ce540f8a457a5b46e4e84e7
institution Directory Open Access Journal
issn 2399-6641
language English
last_indexed 2024-04-24T08:03:05Z
publishDate 2024-04-01
publisher BMJ Publishing Group
record_format Article
series BMJ Open Quality
spelling doaj.art-b62f38fc2ce540f8a457a5b46e4e84e72024-04-17T16:10:09ZengBMJ Publishing GroupBMJ Open Quality2399-66412024-04-0113210.1136/bmjoq-2023-002722Achieving high inter-rater reliability in establishing data labels: a retrospective chart review studyHude Quan0Danielle A Southern1Yuan Xu2Cathy Eastwood3Guosong Wu4Natalie Sapiro5Cheligeer Cheligeer6Community Health Sciences, University of Calgary Cumming School of Medicine, Calgary, Alberta, CanadaDepartment of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, CanadaDepartment of Surgery, University of Calgary, Calgary, Alberta, CanadaDepartment of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, CanadaDepartment of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, CanadaCentre for Health Informatics, Department of Community Health Sciences, University of Calgary, Calgary, Alberta, CanadaAlberta Health Services, Calgary, Alberta, CanadaBackground In medical research, the effectiveness of machine learning algorithms depends heavily on the accuracy of labeled data. This study aimed to assess inter-rater reliability (IRR) in a retrospective electronic medical chart review to create high quality labeled data on comorbidities and adverse events (AEs).Methods Six registered nurses with diverse clinical backgrounds reviewed patient charts, extracted data on 20 predefined comorbidities and 18 AEs. All reviewers underwent four iterative rounds of training aimed to enhance accuracy and foster consensus. Periodic monitoring was conducted at the beginning, middle, and end of the testing phase to ensure data quality. Weighted Kappa coefficients were calculated with their associated 95% confidence intervals (CIs).Results Seventy patient charts were reviewed. The overall agreement, measured by Conger's Kappa, was 0.80 (95% CI: 0.78-0.82). IRR scores remained consistently high (ranging from 0.70 to 0.87) throughout each phase.Conclusion Our study suggests the detailed manual for chart review and structured training regimen resulted in a consistently high level of agreement among our reviewers during the chart review process. This establishes a robust foundation for generating high-quality labeled data, thereby enhancing the potential for developing accurate machine learning algorithms.https://bmjopenquality.bmj.com/content/13/2/e002722.full
spellingShingle Hude Quan
Danielle A Southern
Yuan Xu
Cathy Eastwood
Guosong Wu
Natalie Sapiro
Cheligeer Cheligeer
Achieving high inter-rater reliability in establishing data labels: a retrospective chart review study
BMJ Open Quality
title Achieving high inter-rater reliability in establishing data labels: a retrospective chart review study
title_full Achieving high inter-rater reliability in establishing data labels: a retrospective chart review study
title_fullStr Achieving high inter-rater reliability in establishing data labels: a retrospective chart review study
title_full_unstemmed Achieving high inter-rater reliability in establishing data labels: a retrospective chart review study
title_short Achieving high inter-rater reliability in establishing data labels: a retrospective chart review study
title_sort achieving high inter rater reliability in establishing data labels a retrospective chart review study
url https://bmjopenquality.bmj.com/content/13/2/e002722.full
work_keys_str_mv AT hudequan achievinghighinterraterreliabilityinestablishingdatalabelsaretrospectivechartreviewstudy
AT danielleasouthern achievinghighinterraterreliabilityinestablishingdatalabelsaretrospectivechartreviewstudy
AT yuanxu achievinghighinterraterreliabilityinestablishingdatalabelsaretrospectivechartreviewstudy
AT cathyeastwood achievinghighinterraterreliabilityinestablishingdatalabelsaretrospectivechartreviewstudy
AT guosongwu achievinghighinterraterreliabilityinestablishingdatalabelsaretrospectivechartreviewstudy
AT nataliesapiro achievinghighinterraterreliabilityinestablishingdatalabelsaretrospectivechartreviewstudy
AT cheligeercheligeer achievinghighinterraterreliabilityinestablishingdatalabelsaretrospectivechartreviewstudy