Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database

Background: Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are p...

Full description

Bibliographic Details
Main Authors: Tim Dong, Nicholas Sunderland, Angus Nightingale, Daniel P. Fudulu, Jeremy Chan, Ben Zhai, Alberto Freitas, Massimo Caputo, Arnaldo Dimagli, Stuart Mires, Mike Wyatt, Umberto Benedetto, Gianni D. Angelini
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Bioengineering
Subjects:
Online Access:https://www.mdpi.com/2306-5354/10/11/1307
_version_ 1797460145968513024
author Tim Dong
Nicholas Sunderland
Angus Nightingale
Daniel P. Fudulu
Jeremy Chan
Ben Zhai
Alberto Freitas
Massimo Caputo
Arnaldo Dimagli
Stuart Mires
Mike Wyatt
Umberto Benedetto
Gianni D. Angelini
author_facet Tim Dong
Nicholas Sunderland
Angus Nightingale
Daniel P. Fudulu
Jeremy Chan
Ben Zhai
Alberto Freitas
Massimo Caputo
Arnaldo Dimagli
Stuart Mires
Mike Wyatt
Umberto Benedetto
Gianni D. Angelini
author_sort Tim Dong
collection DOAJ
description Background: Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis. Objectives: To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use. Methods: 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated. Results: Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, <i>p</i> < 0.05) alongside high R<sup>2</sup> values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75–0.9, <i>p</i> < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E’ Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance. Conclusions: The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.
first_indexed 2024-03-09T17:00:57Z
format Article
id doaj.art-c6bad10e1eda4524acac3ba8cfc78a40
institution Directory Open Access Journal
issn 2306-5354
language English
last_indexed 2024-03-09T17:00:57Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Bioengineering
spelling doaj.art-c6bad10e1eda4524acac3ba8cfc78a402023-11-24T14:29:54ZengMDPI AGBioengineering2306-53542023-11-011011130710.3390/bioengineering10111307Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) DatabaseTim Dong0Nicholas Sunderland1Angus Nightingale2Daniel P. Fudulu3Jeremy Chan4Ben Zhai5Alberto Freitas6Massimo Caputo7Arnaldo Dimagli8Stuart Mires9Mike Wyatt10Umberto Benedetto11Gianni D. Angelini12Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UKBristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UKBristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UKBristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UKBristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UKSchool of Computing Science, Northumbria University, Newcastle upon Tyne NE1 8ST, UKFaculty of Medicine, University of Porto, 4100 Porto, PortugalBristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UKBristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UKBristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UKUniversity Hospitals Bristol and Weston, Marlborough St, Bristol BS1 3NU, UKBristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UKBristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UKBackground: Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis. Objectives: To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use. Methods: 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated. Results: Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, <i>p</i> < 0.05) alongside high R<sup>2</sup> values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75–0.9, <i>p</i> < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E’ Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance. Conclusions: The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.https://www.mdpi.com/2306-5354/10/11/1307electronic health records (EHR)Big Dataunstructured dataecho reportechocardiography analysisnatural language processing (NLP)
spellingShingle Tim Dong
Nicholas Sunderland
Angus Nightingale
Daniel P. Fudulu
Jeremy Chan
Ben Zhai
Alberto Freitas
Massimo Caputo
Arnaldo Dimagli
Stuart Mires
Mike Wyatt
Umberto Benedetto
Gianni D. Angelini
Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
Bioengineering
electronic health records (EHR)
Big Data
unstructured data
echo report
echocardiography analysis
natural language processing (NLP)
title Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
title_full Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
title_fullStr Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
title_full_unstemmed Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
title_short Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
title_sort development and evaluation of a natural language processing system for curating a trans thoracic echocardiogram tte database
topic electronic health records (EHR)
Big Data
unstructured data
echo report
echocardiography analysis
natural language processing (NLP)
url https://www.mdpi.com/2306-5354/10/11/1307
work_keys_str_mv AT timdong developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT nicholassunderland developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT angusnightingale developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT danielpfudulu developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT jeremychan developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT benzhai developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT albertofreitas developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT massimocaputo developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT arnaldodimagli developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT stuartmires developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT mikewyatt developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT umbertobenedetto developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT giannidangelini developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase