A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records

Abstract Background Electronic Medical Record (EMR) comprises patients’ medical information gathered by medical stuff for providing better health care. Named Entity Recognition (NER) is a sub-field of information extraction aimed at identifying specific entity terms such as disease, test, symptom, g...

Full description

Bibliographic Details
Main Authors:	Shanta Chowdhury, Xishuang Dong, Lijun Qian, Xiangfang Li, Yi Guan, Jinfeng Yang, Qiubin Yu
Format:	Article
Language:	English
Published:	BMC 2018-12-01
Series:	BMC Bioinformatics
Subjects:	Recurrent neural network Multitask learning Word embedding Parts-of-speech tagging Named entity recognition Electronic medical records
Online Access:	http://link.springer.com/article/10.1186/s12859-018-2467-9

_version_	1818448634447921152
author	Shanta Chowdhury Xishuang Dong Lijun Qian Xiangfang Li Yi Guan Jinfeng Yang Qiubin Yu
author_facet	Shanta Chowdhury Xishuang Dong Lijun Qian Xiangfang Li Yi Guan Jinfeng Yang Qiubin Yu
author_sort	Shanta Chowdhury
collection	DOAJ
description	Abstract Background Electronic Medical Record (EMR) comprises patients’ medical information gathered by medical stuff for providing better health care. Named Entity Recognition (NER) is a sub-field of information extraction aimed at identifying specific entity terms such as disease, test, symptom, genes etc. NER can be a relief for healthcare providers and medical specialists to extract useful information automatically and avoid unnecessary and unrelated information in EMR. However, limited resources of available EMR pose a great challenge for mining entity terms. Therefore, a multitask bi-directional RNN model is proposed here as a potential solution of data augmentation to enhance NER performance with limited data. Methods A multitask bi-directional RNN model is proposed for extracting entity terms from Chinese EMR. The proposed model can be divided into a shared layer and a task specific layer. Firstly, vector representation of each word is obtained as a concatenation of word embedding and character embedding. Then Bi-directional RNN is used to extract context information from sentence. After that, all these layers are shared by two different task layers, namely the parts-of-speech tagging task layer and the named entity recognition task layer. These two tasks layers are trained alternatively so that the knowledge learned from named entity recognition task can be enhanced by the knowledge gained from parts-of-speech tagging task. Results The performance of our proposed model has been evaluated in terms of micro average F-score, macro average F-score and accuracy. It is observed that the proposed model outperforms the baseline model in all cases. For instance, experimental results conducted on the discharge summaries show that the micro average F-score and the macro average F-score are improved by 2.41% point and 4.16% point, respectively, and the overall accuracy is improved by 5.66% point. Conclusions In this paper, a novel multitask bi-directional RNN model is proposed for improving the performance of named entity recognition in EMR. Evaluation results using real datasets demonstrate the effectiveness of the proposed model.
first_indexed	2024-12-14T20:22:38Z
format	Article
id	doaj.art-4868c0c2b38e4341928579771b5a2862
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-14T20:22:38Z
publishDate	2018-12-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-4868c0c2b38e4341928579771b5a28622022-12-21T22:48:42ZengBMCBMC Bioinformatics1471-21052018-12-0119S17758410.1186/s12859-018-2467-9A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical recordsShanta Chowdhury0Xishuang Dong1Lijun Qian2Xiangfang Li3Yi Guan4Jinfeng Yang5Qiubin Yu6Center of Excellence in Research and Education for Big Military Data Intelligence (CREDIT), Department of Electrical and Computer Engineering, Prairie View A&M University, Texas A&M University SystemCenter of Excellence in Research and Education for Big Military Data Intelligence (CREDIT), Department of Electrical and Computer Engineering, Prairie View A&M University, Texas A&M University SystemCenter of Excellence in Research and Education for Big Military Data Intelligence (CREDIT), Department of Electrical and Computer Engineering, Prairie View A&M University, Texas A&M University SystemCenter of Excellence in Research and Education for Big Military Data Intelligence (CREDIT), Department of Electrical and Computer Engineering, Prairie View A&M University, Texas A&M University SystemSchools of Computer Science and Technology, Harbin Institute of TechnologySchools of Software, Harbin University of Science and TechnologySecond Affiliated Hospital of Harbin Medical UniversityAbstract Background Electronic Medical Record (EMR) comprises patients’ medical information gathered by medical stuff for providing better health care. Named Entity Recognition (NER) is a sub-field of information extraction aimed at identifying specific entity terms such as disease, test, symptom, genes etc. NER can be a relief for healthcare providers and medical specialists to extract useful information automatically and avoid unnecessary and unrelated information in EMR. However, limited resources of available EMR pose a great challenge for mining entity terms. Therefore, a multitask bi-directional RNN model is proposed here as a potential solution of data augmentation to enhance NER performance with limited data. Methods A multitask bi-directional RNN model is proposed for extracting entity terms from Chinese EMR. The proposed model can be divided into a shared layer and a task specific layer. Firstly, vector representation of each word is obtained as a concatenation of word embedding and character embedding. Then Bi-directional RNN is used to extract context information from sentence. After that, all these layers are shared by two different task layers, namely the parts-of-speech tagging task layer and the named entity recognition task layer. These two tasks layers are trained alternatively so that the knowledge learned from named entity recognition task can be enhanced by the knowledge gained from parts-of-speech tagging task. Results The performance of our proposed model has been evaluated in terms of micro average F-score, macro average F-score and accuracy. It is observed that the proposed model outperforms the baseline model in all cases. For instance, experimental results conducted on the discharge summaries show that the micro average F-score and the macro average F-score are improved by 2.41% point and 4.16% point, respectively, and the overall accuracy is improved by 5.66% point. Conclusions In this paper, a novel multitask bi-directional RNN model is proposed for improving the performance of named entity recognition in EMR. Evaluation results using real datasets demonstrate the effectiveness of the proposed model.http://link.springer.com/article/10.1186/s12859-018-2467-9Recurrent neural networkMultitask learningWord embeddingParts-of-speech taggingNamed entity recognitionElectronic medical records
spellingShingle	Shanta Chowdhury Xishuang Dong Lijun Qian Xiangfang Li Yi Guan Jinfeng Yang Qiubin Yu A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records BMC Bioinformatics Recurrent neural network Multitask learning Word embedding Parts-of-speech tagging Named entity recognition Electronic medical records
title	A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records
title_full	A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records
title_fullStr	A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records
title_full_unstemmed	A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records
title_short	A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records
title_sort	multitask bi directional rnn model for named entity recognition on chinese electronic medical records
topic	Recurrent neural network Multitask learning Word embedding Parts-of-speech tagging Named entity recognition Electronic medical records
url	http://link.springer.com/article/10.1186/s12859-018-2467-9
work_keys_str_mv	AT shantachowdhury amultitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT xishuangdong amultitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT lijunqian amultitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT xiangfangli amultitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT yiguan amultitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT jinfengyang amultitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT qiubinyu amultitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT shantachowdhury multitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT xishuangdong multitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT lijunqian multitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT xiangfangli multitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT yiguan multitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT jinfengyang multitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords AT qiubinyu multitaskbidirectionalrnnmodelfornamedentityrecognitiononchineseelectronicmedicalrecords

A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records

Similar Items