Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients

Abstract Background Two years into the COVID-19 pandemic and with more than five million deaths worldwide, the healthcare establishment continues to struggle with every new wave of the pandemic resulting from a new coronavirus variant. Research has demonstrated that there are variations in the sympt...

Full description

Bibliographic Details
Main Authors: Vipina K. Keloth, Shuxin Zhou, Luke Lindemann, Ling Zheng, Gai Elhanan, Andrew J. Einstein, James Geller, Yehoshua Perl
Format: Article
Language:English
Published: BMC 2023-02-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-023-02136-0
_version_ 1827114519186374656
author Vipina K. Keloth
Shuxin Zhou
Luke Lindemann
Ling Zheng
Gai Elhanan
Andrew J. Einstein
James Geller
Yehoshua Perl
author_facet Vipina K. Keloth
Shuxin Zhou
Luke Lindemann
Ling Zheng
Gai Elhanan
Andrew J. Einstein
James Geller
Yehoshua Perl
author_sort Vipina K. Keloth
collection DOAJ
description Abstract Background Two years into the COVID-19 pandemic and with more than five million deaths worldwide, the healthcare establishment continues to struggle with every new wave of the pandemic resulting from a new coronavirus variant. Research has demonstrated that there are variations in the symptoms, and even in the order of symptom presentations, in COVID-19 patients infected by different SARS-CoV-2 variants (e.g., Alpha and Omicron). Textual data in the form of admission notes and physician notes in the Electronic Health Records (EHRs) is rich in information regarding the symptoms and their orders of presentation. Unstructured EHR data is often underutilized in research due to the lack of annotations that enable automatic extraction of useful information from the available extensive volumes of textual data. Methods We present the design of a COVID Interface Terminology (CIT), not just a generic COVID-19 terminology, but one serving a specific purpose of enabling automatic annotation of EHRs of COVID-19 patients. CIT was constructed by integrating existing COVID-related ontologies and mining additional fine granularity concepts from clinical notes. The iterative mining approach utilized the techniques of 'anchoring' and 'concatenation' to identify potential fine granularity concepts to be added to the CIT. We also tested the generalizability of our approach on a hold-out dataset and compared the annotation coverage to the coverage obtained for the dataset used to build the CIT. Results Our experiments demonstrate that this approach results in higher annotation coverage compared to existing ontologies such as SNOMED CT and Coronavirus Infectious Disease Ontology (CIDO). The final version of CIT achieved about 20% more coverage than SNOMED CT and 50% more coverage than CIDO. In the future, the concepts mined and added into CIT could be used as training data for machine learning models for mining even more concepts into CIT and further increasing the annotation coverage. Conclusion In this paper, we demonstrated the construction of a COVID interface terminology that can be utilized for automatically annotating EHRs of COVID-19 patients. The techniques presented can identify frequently documented fine granularity concepts that are missing in other ontologies thereby increasing the annotation coverage.
first_indexed 2024-04-09T22:52:47Z
format Article
id doaj.art-55fcbb97e35f4ff28f9c7fa353236747
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2025-03-20T12:03:10Z
publishDate 2023-02-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-55fcbb97e35f4ff28f9c7fa3532367472024-09-15T11:22:24ZengBMCBMC Medical Informatics and Decision Making1472-69472023-02-0123S111810.1186/s12911-023-02136-0Mining of EHR for interface terminology concepts for annotating EHRs of COVID patientsVipina K. Keloth0Shuxin Zhou1Luke Lindemann2Ling Zheng3Gai Elhanan4Andrew J. Einstein5James Geller6Yehoshua Perl7School of Biomedical Informatics, University of Texas Health Science Center at HoustonDepartment of Computer Science, New Jersey Institute of TechnologySchool of Medicine and Health Sciences, The George Washington UniversityComputer Science and Software Engineering Department, Monmouth UniversityRenown Institute for Health Innovation, Desert Research InstituteCardiology Division, Department of Medicine, Columbia University Irving Medical CenterDepartment of Computer Science, New Jersey Institute of TechnologyDepartment of Computer Science, New Jersey Institute of TechnologyAbstract Background Two years into the COVID-19 pandemic and with more than five million deaths worldwide, the healthcare establishment continues to struggle with every new wave of the pandemic resulting from a new coronavirus variant. Research has demonstrated that there are variations in the symptoms, and even in the order of symptom presentations, in COVID-19 patients infected by different SARS-CoV-2 variants (e.g., Alpha and Omicron). Textual data in the form of admission notes and physician notes in the Electronic Health Records (EHRs) is rich in information regarding the symptoms and their orders of presentation. Unstructured EHR data is often underutilized in research due to the lack of annotations that enable automatic extraction of useful information from the available extensive volumes of textual data. Methods We present the design of a COVID Interface Terminology (CIT), not just a generic COVID-19 terminology, but one serving a specific purpose of enabling automatic annotation of EHRs of COVID-19 patients. CIT was constructed by integrating existing COVID-related ontologies and mining additional fine granularity concepts from clinical notes. The iterative mining approach utilized the techniques of 'anchoring' and 'concatenation' to identify potential fine granularity concepts to be added to the CIT. We also tested the generalizability of our approach on a hold-out dataset and compared the annotation coverage to the coverage obtained for the dataset used to build the CIT. Results Our experiments demonstrate that this approach results in higher annotation coverage compared to existing ontologies such as SNOMED CT and Coronavirus Infectious Disease Ontology (CIDO). The final version of CIT achieved about 20% more coverage than SNOMED CT and 50% more coverage than CIDO. In the future, the concepts mined and added into CIT could be used as training data for machine learning models for mining even more concepts into CIT and further increasing the annotation coverage. Conclusion In this paper, we demonstrated the construction of a COVID interface terminology that can be utilized for automatically annotating EHRs of COVID-19 patients. The techniques presented can identify frequently documented fine granularity concepts that are missing in other ontologies thereby increasing the annotation coverage.https://doi.org/10.1186/s12911-023-02136-0Interface terminologyCOVID-19 ontologiesConcept miningEHR annotation
spellingShingle Vipina K. Keloth
Shuxin Zhou
Luke Lindemann
Ling Zheng
Gai Elhanan
Andrew J. Einstein
James Geller
Yehoshua Perl
Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients
BMC Medical Informatics and Decision Making
Interface terminology
COVID-19 ontologies
Concept mining
EHR annotation
title Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients
title_full Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients
title_fullStr Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients
title_full_unstemmed Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients
title_short Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients
title_sort mining of ehr for interface terminology concepts for annotating ehrs of covid patients
topic Interface terminology
COVID-19 ontologies
Concept mining
EHR annotation
url https://doi.org/10.1186/s12911-023-02136-0
work_keys_str_mv AT vipinakkeloth miningofehrforinterfaceterminologyconceptsforannotatingehrsofcovidpatients
AT shuxinzhou miningofehrforinterfaceterminologyconceptsforannotatingehrsofcovidpatients
AT lukelindemann miningofehrforinterfaceterminologyconceptsforannotatingehrsofcovidpatients
AT lingzheng miningofehrforinterfaceterminologyconceptsforannotatingehrsofcovidpatients
AT gaielhanan miningofehrforinterfaceterminologyconceptsforannotatingehrsofcovidpatients
AT andrewjeinstein miningofehrforinterfaceterminologyconceptsforannotatingehrsofcovidpatients
AT jamesgeller miningofehrforinterfaceterminologyconceptsforannotatingehrsofcovidpatients
AT yehoshuaperl miningofehrforinterfaceterminologyconceptsforannotatingehrsofcovidpatients