We are not ready yet: limitations of state-of-the-art disease named entity recognizers

Abstract Background Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent res...

Full description

Bibliographic Details
Main Authors:	Lisa Kühnel, Juliane Fluck
Format:	Article
Language:	English
Published:	BMC 2022-10-01
Series:	Journal of Biomedical Semantics
Subjects:	Text mining bioNLP BERT Manual Curation
Online Access:	https://doi.org/10.1186/s13326-022-00280-6

_version_	1797991804497297408
author	Lisa Kühnel Juliane Fluck
author_facet	Lisa Kühnel Juliane Fluck
author_sort	Lisa Kühnel
collection	DOAJ
description	Abstract Background Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize. Results Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data. Conclusions We argue that there is a need for larger annotated data sets for training and testing. Therefore, we foresee the curation of further data sets and, moreover, the investigation of continual learning processes for machine learning-based models.
first_indexed	2024-04-11T08:58:02Z
format	Article
id	doaj.art-1e7234459f6342e1b7ac674cafdab3ea
institution	Directory Open Access Journal
issn	2041-1480
language	English
last_indexed	2024-04-11T08:58:02Z
publishDate	2022-10-01
publisher	BMC
record_format	Article
series	Journal of Biomedical Semantics
spelling	doaj.art-1e7234459f6342e1b7ac674cafdab3ea2022-12-22T04:33:08ZengBMCJournal of Biomedical Semantics2041-14802022-10-0113111010.1186/s13326-022-00280-6We are not ready yet: limitations of state-of-the-art disease named entity recognizersLisa Kühnel0Juliane Fluck1ZB MED - Information Centre for Life SciencesZB MED - Information Centre for Life SciencesAbstract Background Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize. Results Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data. Conclusions We argue that there is a need for larger annotated data sets for training and testing. Therefore, we foresee the curation of further data sets and, moreover, the investigation of continual learning processes for machine learning-based models.https://doi.org/10.1186/s13326-022-00280-6Text miningbioNLPBERTManual Curation
spellingShingle	Lisa Kühnel Juliane Fluck We are not ready yet: limitations of state-of-the-art disease named entity recognizers Journal of Biomedical Semantics Text mining bioNLP BERT Manual Curation
title	We are not ready yet: limitations of state-of-the-art disease named entity recognizers
title_full	We are not ready yet: limitations of state-of-the-art disease named entity recognizers
title_fullStr	We are not ready yet: limitations of state-of-the-art disease named entity recognizers
title_full_unstemmed	We are not ready yet: limitations of state-of-the-art disease named entity recognizers
title_short	We are not ready yet: limitations of state-of-the-art disease named entity recognizers
title_sort	we are not ready yet limitations of state of the art disease named entity recognizers
topic	Text mining bioNLP BERT Manual Curation
url	https://doi.org/10.1186/s13326-022-00280-6
work_keys_str_mv	AT lisakuhnel wearenotreadyyetlimitationsofstateoftheartdiseasenamedentityrecognizers AT julianefluck wearenotreadyyetlimitationsofstateoftheartdiseasenamedentityrecognizers

We are not ready yet: limitations of state-of-the-art disease named entity recognizers

Similar Items