LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools

Abstract Background Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating c...

Full description

Bibliographic Details
Main Authors: Wahed Hemati, Alexander Mehler
Format: Article
Language:English
Published: BMC 2019-01-01
Series:Journal of Cheminformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13321-018-0327-2
_version_ 1819204073725886464
author Wahed Hemati
Alexander Mehler
author_facet Wahed Hemati
Alexander Mehler
author_sort Wahed Hemati
collection DOAJ
description Abstract Background Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of chemical named entities. For this purpose, we transform the task of NER into a sequence labeling problem. We present a series of sequence labeling systems that we used, adapted and optimized in our experiments for solving this task. To this end, we experiment with hyperparameter optimization. Finally, we present LSTMVoter, a two-stage application of recurrent neural networks that integrates the optimized sequence labelers from our study into a single ensemble classifier. Results We introduce LSTMVoter, a bidirectional long short-term memory (LSTM) tagger that utilizes a conditional random field layer in conjunction with attention-based feature modeling. Our approach explores information about features that is modeled by means of an attention mechanism. LSTMVoter outperforms each extractor integrated by it in a series of experiments. On the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus, LSTMVoter achieves an F1-score of 90.04%; on the BioCreative V.5 chemical entity mention in patents corpus, it achieves an F1-score of 89.01%. Availability and implementation Data and code are available at https://github.com/texttechnologylab/LSTMVoter.
first_indexed 2024-12-23T04:30:01Z
format Article
id doaj.art-beb6d613fd7a49578cf84fa7a947153c
institution Directory Open Access Journal
issn 1758-2946
language English
last_indexed 2024-12-23T04:30:01Z
publishDate 2019-01-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj.art-beb6d613fd7a49578cf84fa7a947153c2022-12-21T18:00:04ZengBMCJournal of Cheminformatics1758-29462019-01-011111710.1186/s13321-018-0327-2LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling toolsWahed Hemati0Alexander Mehler1Text Technology Lab, Goethe-University FrankfurtText Technology Lab, Goethe-University FrankfurtAbstract Background Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of chemical named entities. For this purpose, we transform the task of NER into a sequence labeling problem. We present a series of sequence labeling systems that we used, adapted and optimized in our experiments for solving this task. To this end, we experiment with hyperparameter optimization. Finally, we present LSTMVoter, a two-stage application of recurrent neural networks that integrates the optimized sequence labelers from our study into a single ensemble classifier. Results We introduce LSTMVoter, a bidirectional long short-term memory (LSTM) tagger that utilizes a conditional random field layer in conjunction with attention-based feature modeling. Our approach explores information about features that is modeled by means of an attention mechanism. LSTMVoter outperforms each extractor integrated by it in a series of experiments. On the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus, LSTMVoter achieves an F1-score of 90.04%; on the BioCreative V.5 chemical entity mention in patents corpus, it achieves an F1-score of 89.01%. Availability and implementation Data and code are available at https://github.com/texttechnologylab/LSTMVoter.http://link.springer.com/article/10.1186/s13321-018-0327-2BioCreative V.5CEMPCHEMDNERBioNLPNamed entity recognitionDeep learning
spellingShingle Wahed Hemati
Alexander Mehler
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
Journal of Cheminformatics
BioCreative V.5
CEMP
CHEMDNER
BioNLP
Named entity recognition
Deep learning
title LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
title_full LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
title_fullStr LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
title_full_unstemmed LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
title_short LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
title_sort lstmvoter chemical named entity recognition using a conglomerate of sequence labeling tools
topic BioCreative V.5
CEMP
CHEMDNER
BioNLP
Named entity recognition
Deep learning
url http://link.springer.com/article/10.1186/s13321-018-0327-2
work_keys_str_mv AT wahedhemati lstmvoterchemicalnamedentityrecognitionusingaconglomerateofsequencelabelingtools
AT alexandermehler lstmvoterchemicalnamedentityrecognitionusingaconglomerateofsequencelabelingtools