PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition

Named entity recognition (NER) in natural language processing encompasses three primary types: flat, nested, and discontinuous. While the flat type often garners attention from researchers, nested NER poses a significant challenge. Current approaches to addressing nested NER involve sequence labelin...

Full description

Bibliographic Details
Main Authors:	Hongjian Yang, Qinghao Zhang, Hyuk-Chul Kwon
Format:	Article
Language:	English
Published:	MDPI AG 2024-02-01
Series:	Applied Sciences
Subjects:	nested entity named entity recognition NER sequence labeling text classification merged label
Online Access:	https://www.mdpi.com/2076-3417/14/5/1717

_version_	1797264870065831936
author	Hongjian Yang Qinghao Zhang Hyuk-Chul Kwon
author_facet	Hongjian Yang Qinghao Zhang Hyuk-Chul Kwon
author_sort	Hongjian Yang
collection	DOAJ
description	Named entity recognition (NER) in natural language processing encompasses three primary types: flat, nested, and discontinuous. While the flat type often garners attention from researchers, nested NER poses a significant challenge. Current approaches to addressing nested NER involve sequence labeling methods with merged label layers, cascaded models, and those rooted in reading comprehension. Among these, sequence labeling with merged label layers stands out for its simplicity and ease of implementation. Yet, highlighted issues persist within this method, prompting our aim to enhance its efficacy. In this study, we propose augmentations to the sequence labeling approach by employing a pipeline model bifurcated into sequence labeling and text classification tasks. Departing from annotating specific entity categories, we amalgamated types into main and sub-categories for a unified treatment. These categories were subsequently embedded as identifiers in the recognition text for the text categorization task. Our choice of resolution involved BERT+BiLSTM+CRF for sequence labeling and the BERT model for text classification. Experiments were conducted across three nested NER datasets: GENIA, CMeEE, and GermEval 2014, featuring annotations varying from four to two levels. Before model training, we conducted separate statistical analyses on nested entities within the medical dataset CMeEE and the everyday life dataset GermEval 2014. Our research unveiled a consistent dominance of a particular entity category within nested entities across both datasets. This observation suggests the potential utility of labeling primary and subsidiary entities for effective category recognition. Model performance was evaluated based on F1 scores, considering correct recognition only when both the complete entity name and category were identified. Results showcased substantial performance enhancement after our proposed modifications compared to the original method. Additionally, our improved model exhibited strong competitiveness against existing models. F1 scores on the GENIA, CMeEE, and GermEval 2014 datasets reached 79.21, 66.71, and 87.81, respectively. Our research highlights that, while preserving the original method’s simplicity and implementation ease, our enhanced model achieves heightened performance and competitive prowess compared to other methodologies.
first_indexed	2024-04-25T00:35:46Z
format	Article
id	doaj.art-b8472369efc747d6b2e8758d9abe9aaa
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-04-25T00:35:46Z
publishDate	2024-02-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-b8472369efc747d6b2e8758d9abe9aaa2024-03-12T16:38:38ZengMDPI AGApplied Sciences2076-34172024-02-01145171710.3390/app14051717PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity RecognitionHongjian Yang0Qinghao Zhang1Hyuk-Chul Kwon2Center for Artificial Intelligence Research, Pusan National University, Busan 46241, Republic of KoreaCenter for Artificial Intelligence Research, Pusan National University, Busan 46241, Republic of KoreaCenter for Artificial Intelligence Research, Pusan National University, Busan 46241, Republic of KoreaNamed entity recognition (NER) in natural language processing encompasses three primary types: flat, nested, and discontinuous. While the flat type often garners attention from researchers, nested NER poses a significant challenge. Current approaches to addressing nested NER involve sequence labeling methods with merged label layers, cascaded models, and those rooted in reading comprehension. Among these, sequence labeling with merged label layers stands out for its simplicity and ease of implementation. Yet, highlighted issues persist within this method, prompting our aim to enhance its efficacy. In this study, we propose augmentations to the sequence labeling approach by employing a pipeline model bifurcated into sequence labeling and text classification tasks. Departing from annotating specific entity categories, we amalgamated types into main and sub-categories for a unified treatment. These categories were subsequently embedded as identifiers in the recognition text for the text categorization task. Our choice of resolution involved BERT+BiLSTM+CRF for sequence labeling and the BERT model for text classification. Experiments were conducted across three nested NER datasets: GENIA, CMeEE, and GermEval 2014, featuring annotations varying from four to two levels. Before model training, we conducted separate statistical analyses on nested entities within the medical dataset CMeEE and the everyday life dataset GermEval 2014. Our research unveiled a consistent dominance of a particular entity category within nested entities across both datasets. This observation suggests the potential utility of labeling primary and subsidiary entities for effective category recognition. Model performance was evaluated based on F1 scores, considering correct recognition only when both the complete entity name and category were identified. Results showcased substantial performance enhancement after our proposed modifications compared to the original method. Additionally, our improved model exhibited strong competitiveness against existing models. F1 scores on the GENIA, CMeEE, and GermEval 2014 datasets reached 79.21, 66.71, and 87.81, respectively. Our research highlights that, while preserving the original method’s simplicity and implementation ease, our enhanced model achieves heightened performance and competitive prowess compared to other methodologies.https://www.mdpi.com/2076-3417/14/5/1717nested entitynamed entity recognitionNERsequence labelingtext classificationmerged label
spellingShingle	Hongjian Yang Qinghao Zhang Hyuk-Chul Kwon PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition Applied Sciences nested entity named entity recognition NER sequence labeling text classification merged label
title	PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition
title_full	PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition
title_fullStr	PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition
title_full_unstemmed	PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition
title_short	PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition
title_sort	pner applying the pipeline method to resolve nested issues in named entity recognition
topic	nested entity named entity recognition NER sequence labeling text classification merged label
url	https://www.mdpi.com/2076-3417/14/5/1717
work_keys_str_mv	AT hongjianyang pnerapplyingthepipelinemethodtoresolvenestedissuesinnamedentityrecognition AT qinghaozhang pnerapplyingthepipelinemethodtoresolvenestedissuesinnamedentityrecognition AT hyukchulkwon pnerapplyingthepipelinemethodtoresolvenestedissuesinnamedentityrecognition

PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition

Similar Items