Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System

Consistent annotation of data is a prerequisite for the successful training and testing of artificial intelligence-based decision support systems in radiology. This can be obtained by standardizing terminology when annotating diagnostic images. The purpose of this study was to evaluate the annotatio...

Full description

Bibliographic Details
Main Authors: Dana Li, Lea Marie Pehrson, Lea Tøttrup, Marco Fraccaro, Rasmus Bonnevie, Jakob Thrane, Peter Jagd Sørensen, Alexander Rykkje, Tobias Thostrup Andersen, Henrik Steglich-Arnholm, Dorte Marianne Rohde Stærk, Lotte Borgwardt, Kristoffer Lindskov Hansen, Sune Darkner, Jonathan Frederik Carlsen, Michael Bachmann Nielsen
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/12/12/3112
_version_ 1827641085883580416
author Dana Li
Lea Marie Pehrson
Lea Tøttrup
Marco Fraccaro
Rasmus Bonnevie
Jakob Thrane
Peter Jagd Sørensen
Alexander Rykkje
Tobias Thostrup Andersen
Henrik Steglich-Arnholm
Dorte Marianne Rohde Stærk
Lotte Borgwardt
Kristoffer Lindskov Hansen
Sune Darkner
Jonathan Frederik Carlsen
Michael Bachmann Nielsen
author_facet Dana Li
Lea Marie Pehrson
Lea Tøttrup
Marco Fraccaro
Rasmus Bonnevie
Jakob Thrane
Peter Jagd Sørensen
Alexander Rykkje
Tobias Thostrup Andersen
Henrik Steglich-Arnholm
Dorte Marianne Rohde Stærk
Lotte Borgwardt
Kristoffer Lindskov Hansen
Sune Darkner
Jonathan Frederik Carlsen
Michael Bachmann Nielsen
author_sort Dana Li
collection DOAJ
description Consistent annotation of data is a prerequisite for the successful training and testing of artificial intelligence-based decision support systems in radiology. This can be obtained by standardizing terminology when annotating diagnostic images. The purpose of this study was to evaluate the annotation consistency among radiologists when using a novel diagnostic labeling scheme for chest X-rays. Six radiologists with experience ranging from one to sixteen years, annotated a set of 100 fully anonymized chest X-rays. The blinded radiologists annotated on two separate occasions. Statistical analyses were done using Randolph’s kappa and PABAK, and the proportions of specific agreements were calculated. Fair-to-excellent agreement was found for all labels among the annotators (Randolph’s Kappa, 0.40–0.99). The PABAK ranged from 0.12 to 1 for the two-reader inter-rater agreement and 0.26 to 1 for the intra-rater agreement. Descriptive and broad labels achieved the highest proportion of positive agreement in both the inter- and intra-reader analyses. Annotating findings with specific, interpretive labels were found to be difficult for less experienced radiologists. Annotating images with descriptive labels may increase agreement between radiologists with different experience levels compared to annotation with interpretive labels.
first_indexed 2024-03-09T17:08:24Z
format Article
id doaj.art-7486627be9e549c5a7d73f768d8bf787
institution Directory Open Access Journal
issn 2075-4418
language English
last_indexed 2024-03-09T17:08:24Z
publishDate 2022-12-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj.art-7486627be9e549c5a7d73f768d8bf7872023-11-24T14:18:51ZengMDPI AGDiagnostics2075-44182022-12-011212311210.3390/diagnostics12123112Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support SystemDana Li0Lea Marie Pehrson1Lea Tøttrup2Marco Fraccaro3Rasmus Bonnevie4Jakob Thrane5Peter Jagd Sørensen6Alexander Rykkje7Tobias Thostrup Andersen8Henrik Steglich-Arnholm9Dorte Marianne Rohde Stærk10Lotte Borgwardt11Kristoffer Lindskov Hansen12Sune Darkner13Jonathan Frederik Carlsen14Michael Bachmann Nielsen15Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Computer Science, University of Copenhagen, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkConsistent annotation of data is a prerequisite for the successful training and testing of artificial intelligence-based decision support systems in radiology. This can be obtained by standardizing terminology when annotating diagnostic images. The purpose of this study was to evaluate the annotation consistency among radiologists when using a novel diagnostic labeling scheme for chest X-rays. Six radiologists with experience ranging from one to sixteen years, annotated a set of 100 fully anonymized chest X-rays. The blinded radiologists annotated on two separate occasions. Statistical analyses were done using Randolph’s kappa and PABAK, and the proportions of specific agreements were calculated. Fair-to-excellent agreement was found for all labels among the annotators (Randolph’s Kappa, 0.40–0.99). The PABAK ranged from 0.12 to 1 for the two-reader inter-rater agreement and 0.26 to 1 for the intra-rater agreement. Descriptive and broad labels achieved the highest proportion of positive agreement in both the inter- and intra-reader analyses. Annotating findings with specific, interpretive labels were found to be difficult for less experienced radiologists. Annotating images with descriptive labels may increase agreement between radiologists with different experience levels compared to annotation with interpretive labels.https://www.mdpi.com/2075-4418/12/12/3112artificial intelligencechest X-rayinter-raterintra-raterimage annotationdiagnostic scheme
spellingShingle Dana Li
Lea Marie Pehrson
Lea Tøttrup
Marco Fraccaro
Rasmus Bonnevie
Jakob Thrane
Peter Jagd Sørensen
Alexander Rykkje
Tobias Thostrup Andersen
Henrik Steglich-Arnholm
Dorte Marianne Rohde Stærk
Lotte Borgwardt
Kristoffer Lindskov Hansen
Sune Darkner
Jonathan Frederik Carlsen
Michael Bachmann Nielsen
Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System
Diagnostics
artificial intelligence
chest X-ray
inter-rater
intra-rater
image annotation
diagnostic scheme
title Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System
title_full Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System
title_fullStr Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System
title_full_unstemmed Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System
title_short Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System
title_sort inter and intra observer agreement when using a diagnostic labeling scheme for annotating findings on chest x rays an early step in the development of a deep learning based decision support system
topic artificial intelligence
chest X-ray
inter-rater
intra-rater
image annotation
diagnostic scheme
url https://www.mdpi.com/2075-4418/12/12/3112
work_keys_str_mv AT danali interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT leamariepehrson interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT leatøttrup interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT marcofraccaro interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT rasmusbonnevie interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT jakobthrane interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT peterjagdsørensen interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT alexanderrykkje interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT tobiasthostrupandersen interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT henriksteglicharnholm interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT dortemariannerohdestærk interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT lotteborgwardt interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT kristofferlindskovhansen interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT sunedarkner interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT jonathanfrederikcarlsen interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem
AT michaelbachmannnielsen interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem