Construction of cardiovascular information extraction corpus based on electronic medical records

Cardiovascular disease has a significant impact on both society and patients, making it necessary to conduct knowledge-based research such as research that utilizes knowledge graphs and automated question answering. However, the existing research on corpus construction for cardiovascular disease is...

Full description

Bibliographic Details
Main Authors: Hongyang Chang, Hongying Zan, Shuai Zhang, Bingfei Zhao, Kunli Zhang
Format: Article
Language:English
Published: AIMS Press 2023-06-01
Series:Mathematical Biosciences and Engineering
Subjects:
Online Access:https://www.aimspress.com/article/doi/10.3934/mbe.2023596?viewType=HTML
_version_ 1827915200245792768
author Hongyang Chang
Hongying Zan
Shuai Zhang
Bingfei Zhao
Kunli Zhang
author_facet Hongyang Chang
Hongying Zan
Shuai Zhang
Bingfei Zhao
Kunli Zhang
author_sort Hongyang Chang
collection DOAJ
description Cardiovascular disease has a significant impact on both society and patients, making it necessary to conduct knowledge-based research such as research that utilizes knowledge graphs and automated question answering. However, the existing research on corpus construction for cardiovascular disease is relatively limited, which has hindered further knowledge-based research on this disease. Electronic medical records contain patient data that span the entire diagnosis and treatment process and include a large amount of reliable medical information. Therefore, we collected electronic medical record data related to cardiovascular disease, combined the data with relevant work experience and developed a standard for labeling cardiovascular electronic medical record entities and entity relations. By building a sentence-level labeling result dictionary through the use of a rule-based semi-automatic method, a cardiovascular electronic medical record entity and entity relationship labeling corpus (CVDEMRC) was constructed. The CVDEMRC contains 7691 entities and 11,185 entity relation triples, and the results of consistency examination were 93.51% and 84.02% for entities and entity-relationship annotations, respectively, demonstrating good consistency results. The CVDEMRC constructed in this study is expected to provide a database for information extraction research related to cardiovascular diseases.
first_indexed 2024-03-13T02:55:07Z
format Article
id doaj.art-f9a7e1928e6644da938206ca9561d6d2
institution Directory Open Access Journal
issn 1551-0018
language English
last_indexed 2024-03-13T02:55:07Z
publishDate 2023-06-01
publisher AIMS Press
record_format Article
series Mathematical Biosciences and Engineering
spelling doaj.art-f9a7e1928e6644da938206ca9561d6d22023-06-28T06:41:36ZengAIMS PressMathematical Biosciences and Engineering1551-00182023-06-01207133791339710.3934/mbe.2023596Construction of cardiovascular information extraction corpus based on electronic medical recordsHongyang Chang0Hongying Zan1Shuai Zhang2Bingfei Zhao 3Kunli Zhang41. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China1. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China 2. Peng Cheng Laboratory, Shenzhen, China1. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China1. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China1. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China 2. Peng Cheng Laboratory, Shenzhen, ChinaCardiovascular disease has a significant impact on both society and patients, making it necessary to conduct knowledge-based research such as research that utilizes knowledge graphs and automated question answering. However, the existing research on corpus construction for cardiovascular disease is relatively limited, which has hindered further knowledge-based research on this disease. Electronic medical records contain patient data that span the entire diagnosis and treatment process and include a large amount of reliable medical information. Therefore, we collected electronic medical record data related to cardiovascular disease, combined the data with relevant work experience and developed a standard for labeling cardiovascular electronic medical record entities and entity relations. By building a sentence-level labeling result dictionary through the use of a rule-based semi-automatic method, a cardiovascular electronic medical record entity and entity relationship labeling corpus (CVDEMRC) was constructed. The CVDEMRC contains 7691 entities and 11,185 entity relation triples, and the results of consistency examination were 93.51% and 84.02% for entities and entity-relationship annotations, respectively, demonstrating good consistency results. The CVDEMRC constructed in this study is expected to provide a database for information extraction research related to cardiovascular diseases.https://www.aimspress.com/article/doi/10.3934/mbe.2023596?viewType=HTMLcardiovascular diseasecorpus constructionelectronic medical record
spellingShingle Hongyang Chang
Hongying Zan
Shuai Zhang
Bingfei Zhao
Kunli Zhang
Construction of cardiovascular information extraction corpus based on electronic medical records
Mathematical Biosciences and Engineering
cardiovascular disease
corpus construction
electronic medical record
title Construction of cardiovascular information extraction corpus based on electronic medical records
title_full Construction of cardiovascular information extraction corpus based on electronic medical records
title_fullStr Construction of cardiovascular information extraction corpus based on electronic medical records
title_full_unstemmed Construction of cardiovascular information extraction corpus based on electronic medical records
title_short Construction of cardiovascular information extraction corpus based on electronic medical records
title_sort construction of cardiovascular information extraction corpus based on electronic medical records
topic cardiovascular disease
corpus construction
electronic medical record
url https://www.aimspress.com/article/doi/10.3934/mbe.2023596?viewType=HTML
work_keys_str_mv AT hongyangchang constructionofcardiovascularinformationextractioncorpusbasedonelectronicmedicalrecords
AT hongyingzan constructionofcardiovascularinformationextractioncorpusbasedonelectronicmedicalrecords
AT shuaizhang constructionofcardiovascularinformationextractioncorpusbasedonelectronicmedicalrecords
AT bingfeizhao constructionofcardiovascularinformationextractioncorpusbasedonelectronicmedicalrecords
AT kunlizhang constructionofcardiovascularinformationextractioncorpusbasedonelectronicmedicalrecords