Unlocking echocardiogram measurements for heart disease research through natural language processing

Abstract Background In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. Implementation A natural language processing system using a dictionary lookup,...

Full description

Bibliographic Details
Main Authors: Olga V. Patterson, Matthew S. Freiberg, Melissa Skanderson, Samah J. Fodeh, Cynthia A. Brandt, Scott L. DuVall
Format: Article
Language:English
Published: BMC 2017-06-01
Series:BMC Cardiovascular Disorders
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12872-017-0580-8
_version_ 1811206487581655040
author Olga V. Patterson
Matthew S. Freiberg
Melissa Skanderson
Samah J. Fodeh
Cynthia A. Brandt
Scott L. DuVall
author_facet Olga V. Patterson
Matthew S. Freiberg
Melissa Skanderson
Samah J. Fodeh
Cynthia A. Brandt
Scott L. DuVall
author_sort Olga V. Patterson
collection DOAJ
description Abstract Background In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. Implementation A natural language processing system using a dictionary lookup, rules, and patterns was developed to extract heart function measurements that are typically recorded in echocardiogram reports as measurement-value pairs. Curated semantic bootstrapping was used to create a custom dictionary that extends existing terminologies based on terms that actually appear in the medical record. A novel disambiguation method based on semantic constraints was created to identify and discard erroneous alternative definitions of the measurement terms. The system was built utilizing a scalable framework, making it available for processing large datasets. Results The system was developed for and validated on notes from three sources: general clinic notes, echocardiogram reports, and radiology reports. The system achieved F-scores of 0.872, 0.844, and 0.877 with precision of 0.936, 0.982, and 0.969 for each dataset respectively averaged across all extracted values. Left ventricular ejection fraction (LVEF) is the most frequently extracted measurement. The precision of extraction of the LVEF measure ranged from 0.968 to 1.0 across different document types. Conclusions This system illustrates the feasibility and effectiveness of a large-scale information extraction on clinical data. New clinical questions can be addressed in the domain of heart failure using retrospective clinical data analysis because key heart function measurements can be successfully extracted using natural language processing.
first_indexed 2024-04-12T03:48:20Z
format Article
id doaj.art-25e6fb3ae444449d8fe4e55d9510420a
institution Directory Open Access Journal
issn 1471-2261
language English
last_indexed 2024-04-12T03:48:20Z
publishDate 2017-06-01
publisher BMC
record_format Article
series BMC Cardiovascular Disorders
spelling doaj.art-25e6fb3ae444449d8fe4e55d9510420a2022-12-22T03:49:03ZengBMCBMC Cardiovascular Disorders1471-22612017-06-0117111110.1186/s12872-017-0580-8Unlocking echocardiogram measurements for heart disease research through natural language processingOlga V. Patterson0Matthew S. Freiberg1Melissa Skanderson2Samah J. Fodeh3Cynthia A. Brandt4Scott L. DuVall5Department of Veterans Affairs Salt Lake City Health Care SystemVA Tennessee Valley Health Care SystemConnecticut VA Healthcare SystemCenter for Medical Informatics, School of Medicine, Yale UniversityConnecticut VA Healthcare SystemDepartment of Veterans Affairs Salt Lake City Health Care SystemAbstract Background In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. Implementation A natural language processing system using a dictionary lookup, rules, and patterns was developed to extract heart function measurements that are typically recorded in echocardiogram reports as measurement-value pairs. Curated semantic bootstrapping was used to create a custom dictionary that extends existing terminologies based on terms that actually appear in the medical record. A novel disambiguation method based on semantic constraints was created to identify and discard erroneous alternative definitions of the measurement terms. The system was built utilizing a scalable framework, making it available for processing large datasets. Results The system was developed for and validated on notes from three sources: general clinic notes, echocardiogram reports, and radiology reports. The system achieved F-scores of 0.872, 0.844, and 0.877 with precision of 0.936, 0.982, and 0.969 for each dataset respectively averaged across all extracted values. Left ventricular ejection fraction (LVEF) is the most frequently extracted measurement. The precision of extraction of the LVEF measure ranged from 0.968 to 1.0 across different document types. Conclusions This system illustrates the feasibility and effectiveness of a large-scale information extraction on clinical data. New clinical questions can be addressed in the domain of heart failure using retrospective clinical data analysis because key heart function measurements can be successfully extracted using natural language processing.http://link.springer.com/article/10.1186/s12872-017-0580-8Natural language processingText miningInformation extractionEchocardiographyHeart functionLeft ventricular ejection fraction
spellingShingle Olga V. Patterson
Matthew S. Freiberg
Melissa Skanderson
Samah J. Fodeh
Cynthia A. Brandt
Scott L. DuVall
Unlocking echocardiogram measurements for heart disease research through natural language processing
BMC Cardiovascular Disorders
Natural language processing
Text mining
Information extraction
Echocardiography
Heart function
Left ventricular ejection fraction
title Unlocking echocardiogram measurements for heart disease research through natural language processing
title_full Unlocking echocardiogram measurements for heart disease research through natural language processing
title_fullStr Unlocking echocardiogram measurements for heart disease research through natural language processing
title_full_unstemmed Unlocking echocardiogram measurements for heart disease research through natural language processing
title_short Unlocking echocardiogram measurements for heart disease research through natural language processing
title_sort unlocking echocardiogram measurements for heart disease research through natural language processing
topic Natural language processing
Text mining
Information extraction
Echocardiography
Heart function
Left ventricular ejection fraction
url http://link.springer.com/article/10.1186/s12872-017-0580-8
work_keys_str_mv AT olgavpatterson unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing
AT matthewsfreiberg unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing
AT melissaskanderson unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing
AT samahjfodeh unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing
AT cynthiaabrandt unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing
AT scottlduvall unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing