Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach
Abstract Background Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data. Objective This study aims to use natural language proce...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-01-01
|
Series: | BMC Medical Informatics and Decision Making |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12911-023-02117-3 |
_version_ | 1811175827480510464 |
---|---|
author | Shaina Raza Brian Schwartz |
author_facet | Shaina Raza Brian Schwartz |
author_sort | Shaina Raza |
collection | DOAJ |
description | Abstract Background Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data. Objective This study aims to use natural language processing (NLP) to extract the key information (clinical factors, social determinants of health) from published cases in the literature. Methods The proposed framework integrates a data layer for preparing a data cohort from clinical case reports; an NLP layer to find the clinical and demographic-named entities and relations in the texts; and an evaluation layer for benchmarking performance and analysis. The focus of this study is to extract valuable information from COVID-19 case reports. Results The named entity recognition implementation in the NLP layer achieves a performance gain of about 1–3% compared to benchmark methods. Furthermore, even without extensive data labeling, the relation extraction method outperforms benchmark methods in terms of accuracy (by 1–8% better). A thorough examination reveals the disease’s presence and symptoms prevalence in patients. Conclusions A similar approach can be generalized to other infectious diseases. It is worthwhile to use prior knowledge acquired through transfer learning when researching other infectious diseases. |
first_indexed | 2024-04-10T19:43:19Z |
format | Article |
id | doaj.art-a2e8467df93f4f979d094edb5028e212 |
institution | Directory Open Access Journal |
issn | 1472-6947 |
language | English |
last_indexed | 2024-04-10T19:43:19Z |
publishDate | 2023-01-01 |
publisher | BMC |
record_format | Article |
series | BMC Medical Informatics and Decision Making |
spelling | doaj.art-a2e8467df93f4f979d094edb5028e2122023-01-29T12:14:09ZengBMCBMC Medical Informatics and Decision Making1472-69472023-01-0123111710.1186/s12911-023-02117-3Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approachShaina Raza0Brian Schwartz1Public Health Ontario (PHO)Public Health Ontario (PHO)Abstract Background Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data. Objective This study aims to use natural language processing (NLP) to extract the key information (clinical factors, social determinants of health) from published cases in the literature. Methods The proposed framework integrates a data layer for preparing a data cohort from clinical case reports; an NLP layer to find the clinical and demographic-named entities and relations in the texts; and an evaluation layer for benchmarking performance and analysis. The focus of this study is to extract valuable information from COVID-19 case reports. Results The named entity recognition implementation in the NLP layer achieves a performance gain of about 1–3% compared to benchmark methods. Furthermore, even without extensive data labeling, the relation extraction method outperforms benchmark methods in terms of accuracy (by 1–8% better). A thorough examination reveals the disease’s presence and symptoms prevalence in patients. Conclusions A similar approach can be generalized to other infectious diseases. It is worthwhile to use prior knowledge acquired through transfer learning when researching other infectious diseases.https://doi.org/10.1186/s12911-023-02117-3Natural language processingData cohortCOVID-19Named entityRelation extractionTransfer learning |
spellingShingle | Shaina Raza Brian Schwartz Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach BMC Medical Informatics and Decision Making Natural language processing Data cohort COVID-19 Named entity Relation extraction Transfer learning |
title | Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach |
title_full | Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach |
title_fullStr | Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach |
title_full_unstemmed | Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach |
title_short | Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach |
title_sort | entity and relation extraction from clinical case reports of covid 19 a natural language processing approach |
topic | Natural language processing Data cohort COVID-19 Named entity Relation extraction Transfer learning |
url | https://doi.org/10.1186/s12911-023-02117-3 |
work_keys_str_mv | AT shainaraza entityandrelationextractionfromclinicalcasereportsofcovid19anaturallanguageprocessingapproach AT brianschwartz entityandrelationextractionfromclinicalcasereportsofcovid19anaturallanguageprocessingapproach |