Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records

Abstract Objective Pituitary adenomas are the most common type of pituitary disorders, which usually occur in young adults and often affect the patient’s physical development, labor capacity and fertility. Clinical free texts noted in electronic medical records (EMRs) of pituitary adenomas patients...

Full description

Bibliographic Details
Main Authors:	An Fang, Jiahui Hu, Wanqing Zhao, Ming Feng, Ji Fu, Shanshan Feng, Pei Lou, Huiling Ren, Xianlai Chen
Format:	Article
Language:	English
Published:	BMC 2022-03-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Clinical information extraction Pituitary adenomas Chinese electronic medical records Clinical named entity recognition Deep learning
Online Access:	https://doi.org/10.1186/s12911-022-01810-z

_version_	1818356679654244352
author	An Fang Jiahui Hu Wanqing Zhao Ming Feng Ji Fu Shanshan Feng Pei Lou Huiling Ren Xianlai Chen
author_facet	An Fang Jiahui Hu Wanqing Zhao Ming Feng Ji Fu Shanshan Feng Pei Lou Huiling Ren Xianlai Chen
author_sort	An Fang
collection	DOAJ
description	Abstract Objective Pituitary adenomas are the most common type of pituitary disorders, which usually occur in young adults and often affect the patient’s physical development, labor capacity and fertility. Clinical free texts noted in electronic medical records (EMRs) of pituitary adenomas patients contain abundant diagnosis and treatment information. However, this information has not been well utilized because of the challenge to extract information from unstructured clinical texts. This study aims to enable machines to intelligently process clinical information, and automatically extract clinical named entity for pituitary adenomas from Chinese EMRs. Methods The clinical corpus used in this study was from one pituitary adenomas neurosurgery treatment center of a 3A hospital in China. Four types of fine-grained texts of clinical records were selected, which included notes from present illness, past medical history, case characteristics and family history of 500 pituitary adenoma inpatients. The dictionary-based matching, conditional random fields (CRF), bidirectional long short-term memory with CRF (BiLSTM-CRF), and bidirectional encoder representations from transformers with BiLSTM-CRF (BERT-BiLSTM-CRF) were used to extract clinical entities from a Chinese EMRs corpus. A comprehensive dictionary was constructed based on open source vocabularies and a domain dictionary for pituitary adenomas to conduct the dictionary-based matching method. We selected features such as part of speech, radical, document type, and the position of characters to train the CRF-based model. Random character embeddings and the character embeddings pretrained by BERT were used respectively as the input features for the BiLSTM-CRF model and the BERT-BiLSTM-CRF model. Both strict metric and relaxed metric were used to evaluate the performance of these methods. Results Experimental results demonstrated that the deep learning and other machine learning methods were able to automatically extract clinical named entities, including symptoms, body regions, diseases, family histories, surgeries, medications, and disease courses of pituitary adenomas from Chinese EMRs. With regard to overall performance, BERT-BiLSTM-CRF has the highest strict F1 value of 91.27% and the highest relaxed F1 value of 95.57% respectively. Additional evaluations showed that BERT-BiLSTM-CRF performed best in almost all entity recognition except surgery and disease course. BiLSTM-CRF performed best in disease course entity recognition, and performed as well as the CRF model for part of speech, radical and document type features, with both strict and relaxed F1 value reaching 96.48%. The CRF model with part of speech, radical and document type features performed best in surgery entity recognition with relaxed F1 value of 95.29%. Conclusions In this study, we conducted four entity recognition methods for pituitary adenomas based on Chinese EMRs. It demonstrates that the deep learning methods can effectively extract various types of clinical entities with satisfying performance. This study contributed to the clinical named entity extraction from Chinese neurosurgical EMRs. The findings could also assist in information extraction in other Chinese medical texts.
first_indexed	2024-12-13T20:01:03Z
format	Article
id	doaj.art-e63155918cd947649421785ba596ff89
institution	Directory Open Access Journal
issn	1472-6947
language	English
last_indexed	2024-12-13T20:01:03Z
publishDate	2022-03-01
publisher	BMC
record_format	Article
series	BMC Medical Informatics and Decision Making
spelling	doaj.art-e63155918cd947649421785ba596ff892022-12-21T23:33:12ZengBMCBMC Medical Informatics and Decision Making1472-69472022-03-0122111410.1186/s12911-022-01810-zExtracting clinical named entity for pituitary adenomas from Chinese electronic medical recordsAn Fang0Jiahui Hu1Wanqing Zhao2Ming Feng3Ji Fu4Shanshan Feng5Pei Lou6Huiling Ren7Xianlai Chen8Life Science College, Central South UniversityInstitute of Medical Information, Chinese Academy of Medical SciencesInstitute of Medical Information, Chinese Academy of Medical SciencesDongcheng District, Peking Union Medical College HospitalDongcheng District, Peking Union Medical College HospitalDongcheng District, Peking Union Medical College HospitalInstitute of Medical Information, Chinese Academy of Medical SciencesInstitute of Medical Information, Chinese Academy of Medical SciencesBig Data Institute, Central South UniversityAbstract Objective Pituitary adenomas are the most common type of pituitary disorders, which usually occur in young adults and often affect the patient’s physical development, labor capacity and fertility. Clinical free texts noted in electronic medical records (EMRs) of pituitary adenomas patients contain abundant diagnosis and treatment information. However, this information has not been well utilized because of the challenge to extract information from unstructured clinical texts. This study aims to enable machines to intelligently process clinical information, and automatically extract clinical named entity for pituitary adenomas from Chinese EMRs. Methods The clinical corpus used in this study was from one pituitary adenomas neurosurgery treatment center of a 3A hospital in China. Four types of fine-grained texts of clinical records were selected, which included notes from present illness, past medical history, case characteristics and family history of 500 pituitary adenoma inpatients. The dictionary-based matching, conditional random fields (CRF), bidirectional long short-term memory with CRF (BiLSTM-CRF), and bidirectional encoder representations from transformers with BiLSTM-CRF (BERT-BiLSTM-CRF) were used to extract clinical entities from a Chinese EMRs corpus. A comprehensive dictionary was constructed based on open source vocabularies and a domain dictionary for pituitary adenomas to conduct the dictionary-based matching method. We selected features such as part of speech, radical, document type, and the position of characters to train the CRF-based model. Random character embeddings and the character embeddings pretrained by BERT were used respectively as the input features for the BiLSTM-CRF model and the BERT-BiLSTM-CRF model. Both strict metric and relaxed metric were used to evaluate the performance of these methods. Results Experimental results demonstrated that the deep learning and other machine learning methods were able to automatically extract clinical named entities, including symptoms, body regions, diseases, family histories, surgeries, medications, and disease courses of pituitary adenomas from Chinese EMRs. With regard to overall performance, BERT-BiLSTM-CRF has the highest strict F1 value of 91.27% and the highest relaxed F1 value of 95.57% respectively. Additional evaluations showed that BERT-BiLSTM-CRF performed best in almost all entity recognition except surgery and disease course. BiLSTM-CRF performed best in disease course entity recognition, and performed as well as the CRF model for part of speech, radical and document type features, with both strict and relaxed F1 value reaching 96.48%. The CRF model with part of speech, radical and document type features performed best in surgery entity recognition with relaxed F1 value of 95.29%. Conclusions In this study, we conducted four entity recognition methods for pituitary adenomas based on Chinese EMRs. It demonstrates that the deep learning methods can effectively extract various types of clinical entities with satisfying performance. This study contributed to the clinical named entity extraction from Chinese neurosurgical EMRs. The findings could also assist in information extraction in other Chinese medical texts.https://doi.org/10.1186/s12911-022-01810-zClinical information extractionPituitary adenomasChinese electronic medical recordsClinical named entity recognitionDeep learning
spellingShingle	An Fang Jiahui Hu Wanqing Zhao Ming Feng Ji Fu Shanshan Feng Pei Lou Huiling Ren Xianlai Chen Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records BMC Medical Informatics and Decision Making Clinical information extraction Pituitary adenomas Chinese electronic medical records Clinical named entity recognition Deep learning
title	Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
title_full	Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
title_fullStr	Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
title_full_unstemmed	Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
title_short	Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
title_sort	extracting clinical named entity for pituitary adenomas from chinese electronic medical records
topic	Clinical information extraction Pituitary adenomas Chinese electronic medical records Clinical named entity recognition Deep learning
url	https://doi.org/10.1186/s12911-022-01810-z
work_keys_str_mv	AT anfang extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords AT jiahuihu extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords AT wanqingzhao extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords AT mingfeng extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords AT jifu extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords AT shanshanfeng extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords AT peilou extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords AT huilingren extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords AT xianlaichen extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records

Similar Items