Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach

Abstract Background Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data. Objective This study aims to use natural language proce...

Full description

Bibliographic Details
Main Authors: Shaina Raza, Brian Schwartz
Format: Article
Language:English
Published: BMC 2023-01-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-023-02117-3
_version_ 1811175827480510464
author Shaina Raza
Brian Schwartz
author_facet Shaina Raza
Brian Schwartz
author_sort Shaina Raza
collection DOAJ
description Abstract Background Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data. Objective This study aims to use natural language processing (NLP) to extract the key information (clinical factors, social determinants of health) from published cases in the literature. Methods The proposed framework integrates a data layer for preparing a data cohort from clinical case reports; an NLP layer to find the clinical and demographic-named entities and relations in the texts; and an evaluation layer for benchmarking performance and analysis. The focus of this study is to extract valuable information from COVID-19 case reports. Results The named entity recognition implementation in the NLP layer achieves a performance gain of about 1–3% compared to benchmark methods. Furthermore, even without extensive data labeling, the relation extraction method outperforms benchmark methods in terms of accuracy (by 1–8% better). A thorough examination reveals the disease’s presence and symptoms prevalence in patients. Conclusions A similar approach can be generalized to other infectious diseases. It is worthwhile to use prior knowledge acquired through transfer learning when researching other infectious diseases.
first_indexed 2024-04-10T19:43:19Z
format Article
id doaj.art-a2e8467df93f4f979d094edb5028e212
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-04-10T19:43:19Z
publishDate 2023-01-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-a2e8467df93f4f979d094edb5028e2122023-01-29T12:14:09ZengBMCBMC Medical Informatics and Decision Making1472-69472023-01-0123111710.1186/s12911-023-02117-3Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approachShaina Raza0Brian Schwartz1Public Health Ontario (PHO)Public Health Ontario (PHO)Abstract Background Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data. Objective This study aims to use natural language processing (NLP) to extract the key information (clinical factors, social determinants of health) from published cases in the literature. Methods The proposed framework integrates a data layer for preparing a data cohort from clinical case reports; an NLP layer to find the clinical and demographic-named entities and relations in the texts; and an evaluation layer for benchmarking performance and analysis. The focus of this study is to extract valuable information from COVID-19 case reports. Results The named entity recognition implementation in the NLP layer achieves a performance gain of about 1–3% compared to benchmark methods. Furthermore, even without extensive data labeling, the relation extraction method outperforms benchmark methods in terms of accuracy (by 1–8% better). A thorough examination reveals the disease’s presence and symptoms prevalence in patients. Conclusions A similar approach can be generalized to other infectious diseases. It is worthwhile to use prior knowledge acquired through transfer learning when researching other infectious diseases.https://doi.org/10.1186/s12911-023-02117-3Natural language processingData cohortCOVID-19Named entityRelation extractionTransfer learning
spellingShingle Shaina Raza
Brian Schwartz
Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach
BMC Medical Informatics and Decision Making
Natural language processing
Data cohort
COVID-19
Named entity
Relation extraction
Transfer learning
title Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach
title_full Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach
title_fullStr Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach
title_full_unstemmed Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach
title_short Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach
title_sort entity and relation extraction from clinical case reports of covid 19 a natural language processing approach
topic Natural language processing
Data cohort
COVID-19
Named entity
Relation extraction
Transfer learning
url https://doi.org/10.1186/s12911-023-02117-3
work_keys_str_mv AT shainaraza entityandrelationextractionfromclinicalcasereportsofcovid19anaturallanguageprocessingapproach
AT brianschwartz entityandrelationextractionfromclinicalcasereportsofcovid19anaturallanguageprocessingapproach