Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System

A chest X-ray report is a communicative tool and can be used as data for developing artificial intelligence-based decision support systems. For both, consistent understanding and labeling is important. Our aim was to investigate how readers would comprehend and annotate 200 chest X-ray reports. Repo...

Full description

Bibliographic Details
Main Authors: Dana Li, Lea Marie Pehrson, Rasmus Bonnevie, Marco Fraccaro, Jakob Thrane, Lea Tøttrup, Carsten Ammitzbøl Lauridsen, Sedrah Butt Balaganeshan, Jelena Jankovic, Tobias Thostrup Andersen, Alyas Mayar, Kristoffer Lindskov Hansen, Jonathan Frederik Carlsen, Sune Darkner, Michael Bachmann Nielsen
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/13/6/1070
_version_ 1797612433362124800
author Dana Li
Lea Marie Pehrson
Rasmus Bonnevie
Marco Fraccaro
Jakob Thrane
Lea Tøttrup
Carsten Ammitzbøl Lauridsen
Sedrah Butt Balaganeshan
Jelena Jankovic
Tobias Thostrup Andersen
Alyas Mayar
Kristoffer Lindskov Hansen
Jonathan Frederik Carlsen
Sune Darkner
Michael Bachmann Nielsen
author_facet Dana Li
Lea Marie Pehrson
Rasmus Bonnevie
Marco Fraccaro
Jakob Thrane
Lea Tøttrup
Carsten Ammitzbøl Lauridsen
Sedrah Butt Balaganeshan
Jelena Jankovic
Tobias Thostrup Andersen
Alyas Mayar
Kristoffer Lindskov Hansen
Jonathan Frederik Carlsen
Sune Darkner
Michael Bachmann Nielsen
author_sort Dana Li
collection DOAJ
description A chest X-ray report is a communicative tool and can be used as data for developing artificial intelligence-based decision support systems. For both, consistent understanding and labeling is important. Our aim was to investigate how readers would comprehend and annotate 200 chest X-ray reports. Reports written between 1 January 2015 and 11 March 2022 were selected based on search words. Annotators included three board-certified radiologists, two trained radiologists (physicians), two radiographers (radiological technicians), a non-radiological physician, and a medical student. Consensus labels by two or more of the experienced radiologists were considered “gold standard”. Matthew’s correlation coefficient (MCC) was calculated to assess annotation performance, and descriptive statistics were used to assess agreement between individual annotators and labels. The intermediate radiologist had the best correlation to “gold standard” (MCC 0.77). This was followed by the novice radiologist and medical student (MCC 0.71 for both), the novice radiographer (MCC 0.65), non-radiological physician (MCC 0.64), and experienced radiographer (MCC 0.57). Our findings showed that for developing an artificial intelligence-based support system, if trained radiologists are not available, annotations from non-radiological annotators with basic and general knowledge may be more aligned with radiologists compared to annotations from sub-specialized medical staff, if their sub-specialization is outside of diagnostic radiology.
first_indexed 2024-03-11T06:42:10Z
format Article
id doaj.art-c5bc1d8aa10645a5ae6a02206b6ca006
institution Directory Open Access Journal
issn 2075-4418
language English
last_indexed 2024-03-11T06:42:10Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj.art-c5bc1d8aa10645a5ae6a02206b6ca0062023-11-17T10:33:54ZengMDPI AGDiagnostics2075-44182023-03-01136107010.3390/diagnostics13061070Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection SystemDana Li0Lea Marie Pehrson1Rasmus Bonnevie2Marco Fraccaro3Jakob Thrane4Lea Tøttrup5Carsten Ammitzbøl Lauridsen6Sedrah Butt Balaganeshan7Jelena Jankovic8Tobias Thostrup Andersen9Alyas Mayar10Kristoffer Lindskov Hansen11Jonathan Frederik Carlsen12Sune Darkner13Michael Bachmann Nielsen14Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkNovo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Health Sciences, Panum Institute, University of Copenhagen, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Computer Science, University of Copenhagen, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, DenmarkA chest X-ray report is a communicative tool and can be used as data for developing artificial intelligence-based decision support systems. For both, consistent understanding and labeling is important. Our aim was to investigate how readers would comprehend and annotate 200 chest X-ray reports. Reports written between 1 January 2015 and 11 March 2022 were selected based on search words. Annotators included three board-certified radiologists, two trained radiologists (physicians), two radiographers (radiological technicians), a non-radiological physician, and a medical student. Consensus labels by two or more of the experienced radiologists were considered “gold standard”. Matthew’s correlation coefficient (MCC) was calculated to assess annotation performance, and descriptive statistics were used to assess agreement between individual annotators and labels. The intermediate radiologist had the best correlation to “gold standard” (MCC 0.77). This was followed by the novice radiologist and medical student (MCC 0.71 for both), the novice radiographer (MCC 0.65), non-radiological physician (MCC 0.64), and experienced radiographer (MCC 0.57). Our findings showed that for developing an artificial intelligence-based support system, if trained radiologists are not available, annotations from non-radiological annotators with basic and general knowledge may be more aligned with radiologists compared to annotations from sub-specialized medical staff, if their sub-specialization is outside of diagnostic radiology.https://www.mdpi.com/2075-4418/13/6/1070chest X-raydeep learningartificial intelligenceagreementperformancetext annotation
spellingShingle Dana Li
Lea Marie Pehrson
Rasmus Bonnevie
Marco Fraccaro
Jakob Thrane
Lea Tøttrup
Carsten Ammitzbøl Lauridsen
Sedrah Butt Balaganeshan
Jelena Jankovic
Tobias Thostrup Andersen
Alyas Mayar
Kristoffer Lindskov Hansen
Jonathan Frederik Carlsen
Sune Darkner
Michael Bachmann Nielsen
Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System
Diagnostics
chest X-ray
deep learning
artificial intelligence
agreement
performance
text annotation
title Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System
title_full Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System
title_fullStr Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System
title_full_unstemmed Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System
title_short Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System
title_sort performance and agreement when annotating chest x ray text reports a preliminary step in the development of a deep learning based prioritization and detection system
topic chest X-ray
deep learning
artificial intelligence
agreement
performance
text annotation
url https://www.mdpi.com/2075-4418/13/6/1070
work_keys_str_mv AT danali performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT leamariepehrson performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT rasmusbonnevie performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT marcofraccaro performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT jakobthrane performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT leatøttrup performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT carstenammitzbøllauridsen performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT sedrahbuttbalaganeshan performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT jelenajankovic performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT tobiasthostrupandersen performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT alyasmayar performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT kristofferlindskovhansen performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT jonathanfrederikcarlsen performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT sunedarkner performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem
AT michaelbachmannnielsen performanceandagreementwhenannotatingchestxraytextreportsapreliminarystepinthedevelopmentofadeeplearningbasedprioritizationanddetectionsystem