Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System
Consistent annotation of data is a prerequisite for the successful training and testing of artificial intelligence-based decision support systems in radiology. This can be obtained by standardizing terminology when annotating diagnostic images. The purpose of this study was to evaluate the annotatio...
Main Authors: | , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-12-01
|
Series: | Diagnostics |
Subjects: | |
Online Access: | https://www.mdpi.com/2075-4418/12/12/3112 |
_version_ | 1827641085883580416 |
---|---|
author | Dana Li Lea Marie Pehrson Lea Tøttrup Marco Fraccaro Rasmus Bonnevie Jakob Thrane Peter Jagd Sørensen Alexander Rykkje Tobias Thostrup Andersen Henrik Steglich-Arnholm Dorte Marianne Rohde Stærk Lotte Borgwardt Kristoffer Lindskov Hansen Sune Darkner Jonathan Frederik Carlsen Michael Bachmann Nielsen |
author_facet | Dana Li Lea Marie Pehrson Lea Tøttrup Marco Fraccaro Rasmus Bonnevie Jakob Thrane Peter Jagd Sørensen Alexander Rykkje Tobias Thostrup Andersen Henrik Steglich-Arnholm Dorte Marianne Rohde Stærk Lotte Borgwardt Kristoffer Lindskov Hansen Sune Darkner Jonathan Frederik Carlsen Michael Bachmann Nielsen |
author_sort | Dana Li |
collection | DOAJ |
description | Consistent annotation of data is a prerequisite for the successful training and testing of artificial intelligence-based decision support systems in radiology. This can be obtained by standardizing terminology when annotating diagnostic images. The purpose of this study was to evaluate the annotation consistency among radiologists when using a novel diagnostic labeling scheme for chest X-rays. Six radiologists with experience ranging from one to sixteen years, annotated a set of 100 fully anonymized chest X-rays. The blinded radiologists annotated on two separate occasions. Statistical analyses were done using Randolph’s kappa and PABAK, and the proportions of specific agreements were calculated. Fair-to-excellent agreement was found for all labels among the annotators (Randolph’s Kappa, 0.40–0.99). The PABAK ranged from 0.12 to 1 for the two-reader inter-rater agreement and 0.26 to 1 for the intra-rater agreement. Descriptive and broad labels achieved the highest proportion of positive agreement in both the inter- and intra-reader analyses. Annotating findings with specific, interpretive labels were found to be difficult for less experienced radiologists. Annotating images with descriptive labels may increase agreement between radiologists with different experience levels compared to annotation with interpretive labels. |
first_indexed | 2024-03-09T17:08:24Z |
format | Article |
id | doaj.art-7486627be9e549c5a7d73f768d8bf787 |
institution | Directory Open Access Journal |
issn | 2075-4418 |
language | English |
last_indexed | 2024-03-09T17:08:24Z |
publishDate | 2022-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Diagnostics |
spelling | doaj.art-7486627be9e549c5a7d73f768d8bf7872023-11-24T14:18:51ZengMDPI AGDiagnostics2075-44182022-12-011212311210.3390/diagnostics12123112Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support SystemDana Li0Lea Marie Pehrson1Lea Tøttrup2Marco Fraccaro3Rasmus Bonnevie4Jakob Thrane5Peter Jagd Sørensen6Alexander Rykkje7Tobias Thostrup Andersen8Henrik Steglich-Arnholm9Dorte Marianne Rohde Stærk10Lotte Borgwardt11Kristoffer Lindskov Hansen12Sune Darkner13Jonathan Frederik Carlsen14Michael Bachmann Nielsen15Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Computer Science, University of Copenhagen, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkConsistent annotation of data is a prerequisite for the successful training and testing of artificial intelligence-based decision support systems in radiology. This can be obtained by standardizing terminology when annotating diagnostic images. The purpose of this study was to evaluate the annotation consistency among radiologists when using a novel diagnostic labeling scheme for chest X-rays. Six radiologists with experience ranging from one to sixteen years, annotated a set of 100 fully anonymized chest X-rays. The blinded radiologists annotated on two separate occasions. Statistical analyses were done using Randolph’s kappa and PABAK, and the proportions of specific agreements were calculated. Fair-to-excellent agreement was found for all labels among the annotators (Randolph’s Kappa, 0.40–0.99). The PABAK ranged from 0.12 to 1 for the two-reader inter-rater agreement and 0.26 to 1 for the intra-rater agreement. Descriptive and broad labels achieved the highest proportion of positive agreement in both the inter- and intra-reader analyses. Annotating findings with specific, interpretive labels were found to be difficult for less experienced radiologists. Annotating images with descriptive labels may increase agreement between radiologists with different experience levels compared to annotation with interpretive labels.https://www.mdpi.com/2075-4418/12/12/3112artificial intelligencechest X-rayinter-raterintra-raterimage annotationdiagnostic scheme |
spellingShingle | Dana Li Lea Marie Pehrson Lea Tøttrup Marco Fraccaro Rasmus Bonnevie Jakob Thrane Peter Jagd Sørensen Alexander Rykkje Tobias Thostrup Andersen Henrik Steglich-Arnholm Dorte Marianne Rohde Stærk Lotte Borgwardt Kristoffer Lindskov Hansen Sune Darkner Jonathan Frederik Carlsen Michael Bachmann Nielsen Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System Diagnostics artificial intelligence chest X-ray inter-rater intra-rater image annotation diagnostic scheme |
title | Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System |
title_full | Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System |
title_fullStr | Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System |
title_full_unstemmed | Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System |
title_short | Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System |
title_sort | inter and intra observer agreement when using a diagnostic labeling scheme for annotating findings on chest x rays an early step in the development of a deep learning based decision support system |
topic | artificial intelligence chest X-ray inter-rater intra-rater image annotation diagnostic scheme |
url | https://www.mdpi.com/2075-4418/12/12/3112 |
work_keys_str_mv | AT danali interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT leamariepehrson interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT leatøttrup interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT marcofraccaro interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT rasmusbonnevie interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT jakobthrane interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT peterjagdsørensen interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT alexanderrykkje interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT tobiasthostrupandersen interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT henriksteglicharnholm interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT dortemariannerohdestærk interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT lotteborgwardt interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT kristofferlindskovhansen interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT sunedarkner interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT jonathanfrederikcarlsen interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem AT michaelbachmannnielsen interandintraobserveragreementwhenusingadiagnosticlabelingschemeforannotatingfindingsonchestxraysanearlystepinthedevelopmentofadeeplearningbaseddecisionsupportsystem |