Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases

The large availability of clinical natural language documents, such as clinical narratives or diagnoses, requires the definition of smart automatic systems for their processing and analysis, but the lack of annotated corpora in the biomedical domain, especially in languages different from English, m...

Full description

Bibliographic Details
Main Authors:	Stefano Silvestri, Francesco Gargiulo, Mario Ciampi
Format:	Article
Language:	English
Published:	MDPI AG 2022-06-01
Series:	Applied Sciences
Subjects:	biomedical NER corpus annotation distant supervision active learning deep learning
Online Access:	https://www.mdpi.com/2076-3417/12/12/5775

_version_	1827662773227618304
author	Stefano Silvestri Francesco Gargiulo Mario Ciampi
author_facet	Stefano Silvestri Francesco Gargiulo Mario Ciampi
author_sort	Stefano Silvestri
collection	DOAJ
description	The large availability of clinical natural language documents, such as clinical narratives or diagnoses, requires the definition of smart automatic systems for their processing and analysis, but the lack of annotated corpora in the biomedical domain, especially in languages different from English, makes it difficult to exploit the state-of-art machine-learning systems to extract information from such kinds of documents. For these reasons, healthcare professionals lose big opportunities that can arise from the analysis of this data. In this paper, we propose a methodology to reduce the manual efforts needed to annotate a biomedical named entity recognition (B-NER) corpus, exploiting both active learning and distant supervision, respectively based on deep learning models (e.g., Bi-LSTM, word2vec FastText, ELMo and BERT) and biomedical knowledge bases, in order to speed up the annotation task and limit class imbalance issues. We assessed this approach by creating an Italian-language electronic health record corpus annotated with biomedical domain entities in a small fraction of the time required for a fully manual annotation. The obtained corpus was used to train a B-NER deep neural network whose performances are comparable with the state of the art, with an F1-Score equal to 0.9661 and 0.8875 on two test sets.
first_indexed	2024-03-10T00:32:39Z
format	Article
id	doaj.art-91949e0f0e4446a6894a2e7c6276c91d
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T00:32:39Z
publishDate	2022-06-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-91949e0f0e4446a6894a2e7c6276c91d2023-11-23T15:22:07ZengMDPI AGApplied Sciences2076-34172022-06-011212577510.3390/app12125775Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge BasesStefano Silvestri0Francesco Gargiulo1Mario Ciampi2Institute for High Performance Computing and Networking of National Research Council, ICAR-CNR, Via Pietro Castellino 111, 80131 Naples, ItalyInstitute for High Performance Computing and Networking of National Research Council, ICAR-CNR, Via Pietro Castellino 111, 80131 Naples, ItalyInstitute for High Performance Computing and Networking of National Research Council, ICAR-CNR, Via Pietro Castellino 111, 80131 Naples, ItalyThe large availability of clinical natural language documents, such as clinical narratives or diagnoses, requires the definition of smart automatic systems for their processing and analysis, but the lack of annotated corpora in the biomedical domain, especially in languages different from English, makes it difficult to exploit the state-of-art machine-learning systems to extract information from such kinds of documents. For these reasons, healthcare professionals lose big opportunities that can arise from the analysis of this data. In this paper, we propose a methodology to reduce the manual efforts needed to annotate a biomedical named entity recognition (B-NER) corpus, exploiting both active learning and distant supervision, respectively based on deep learning models (e.g., Bi-LSTM, word2vec FastText, ELMo and BERT) and biomedical knowledge bases, in order to speed up the annotation task and limit class imbalance issues. We assessed this approach by creating an Italian-language electronic health record corpus annotated with biomedical domain entities in a small fraction of the time required for a fully manual annotation. The obtained corpus was used to train a B-NER deep neural network whose performances are comparable with the state of the art, with an F1-Score equal to 0.9661 and 0.8875 on two test sets.https://www.mdpi.com/2076-3417/12/12/5775biomedical NERcorpus annotationdistant supervisionactive learningdeep learning
spellingShingle	Stefano Silvestri Francesco Gargiulo Mario Ciampi Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases Applied Sciences biomedical NER corpus annotation distant supervision active learning deep learning
title	Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases
title_full	Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases
title_fullStr	Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases
title_full_unstemmed	Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases
title_short	Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases
title_sort	iterative annotation of biomedical ner corpora with deep neural networks and knowledge bases
topic	biomedical NER corpus annotation distant supervision active learning deep learning
url	https://www.mdpi.com/2076-3417/12/12/5775
work_keys_str_mv	AT stefanosilvestri iterativeannotationofbiomedicalnercorporawithdeepneuralnetworksandknowledgebases AT francescogargiulo iterativeannotationofbiomedicalnercorporawithdeepneuralnetworksandknowledgebases AT mariociampi iterativeannotationofbiomedicalnercorporawithdeepneuralnetworksandknowledgebases

Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases

Similar Items