PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition
Named entity recognition (NER) in natural language processing encompasses three primary types: flat, nested, and discontinuous. While the flat type often garners attention from researchers, nested NER poses a significant challenge. Current approaches to addressing nested NER involve sequence labelin...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/14/5/1717 |
_version_ | 1797264870065831936 |
---|---|
author | Hongjian Yang Qinghao Zhang Hyuk-Chul Kwon |
author_facet | Hongjian Yang Qinghao Zhang Hyuk-Chul Kwon |
author_sort | Hongjian Yang |
collection | DOAJ |
description | Named entity recognition (NER) in natural language processing encompasses three primary types: flat, nested, and discontinuous. While the flat type often garners attention from researchers, nested NER poses a significant challenge. Current approaches to addressing nested NER involve sequence labeling methods with merged label layers, cascaded models, and those rooted in reading comprehension. Among these, sequence labeling with merged label layers stands out for its simplicity and ease of implementation. Yet, highlighted issues persist within this method, prompting our aim to enhance its efficacy. In this study, we propose augmentations to the sequence labeling approach by employing a pipeline model bifurcated into sequence labeling and text classification tasks. Departing from annotating specific entity categories, we amalgamated types into main and sub-categories for a unified treatment. These categories were subsequently embedded as identifiers in the recognition text for the text categorization task. Our choice of resolution involved BERT+BiLSTM+CRF for sequence labeling and the BERT model for text classification. Experiments were conducted across three nested NER datasets: GENIA, CMeEE, and GermEval 2014, featuring annotations varying from four to two levels. Before model training, we conducted separate statistical analyses on nested entities within the medical dataset CMeEE and the everyday life dataset GermEval 2014. Our research unveiled a consistent dominance of a particular entity category within nested entities across both datasets. This observation suggests the potential utility of labeling primary and subsidiary entities for effective category recognition. Model performance was evaluated based on F1 scores, considering correct recognition only when both the complete entity name and category were identified. Results showcased substantial performance enhancement after our proposed modifications compared to the original method. Additionally, our improved model exhibited strong competitiveness against existing models. F1 scores on the GENIA, CMeEE, and GermEval 2014 datasets reached 79.21, 66.71, and 87.81, respectively. Our research highlights that, while preserving the original method’s simplicity and implementation ease, our enhanced model achieves heightened performance and competitive prowess compared to other methodologies. |
first_indexed | 2024-04-25T00:35:46Z |
format | Article |
id | doaj.art-b8472369efc747d6b2e8758d9abe9aaa |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-04-25T00:35:46Z |
publishDate | 2024-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-b8472369efc747d6b2e8758d9abe9aaa2024-03-12T16:38:38ZengMDPI AGApplied Sciences2076-34172024-02-01145171710.3390/app14051717PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity RecognitionHongjian Yang0Qinghao Zhang1Hyuk-Chul Kwon2Center for Artificial Intelligence Research, Pusan National University, Busan 46241, Republic of KoreaCenter for Artificial Intelligence Research, Pusan National University, Busan 46241, Republic of KoreaCenter for Artificial Intelligence Research, Pusan National University, Busan 46241, Republic of KoreaNamed entity recognition (NER) in natural language processing encompasses three primary types: flat, nested, and discontinuous. While the flat type often garners attention from researchers, nested NER poses a significant challenge. Current approaches to addressing nested NER involve sequence labeling methods with merged label layers, cascaded models, and those rooted in reading comprehension. Among these, sequence labeling with merged label layers stands out for its simplicity and ease of implementation. Yet, highlighted issues persist within this method, prompting our aim to enhance its efficacy. In this study, we propose augmentations to the sequence labeling approach by employing a pipeline model bifurcated into sequence labeling and text classification tasks. Departing from annotating specific entity categories, we amalgamated types into main and sub-categories for a unified treatment. These categories were subsequently embedded as identifiers in the recognition text for the text categorization task. Our choice of resolution involved BERT+BiLSTM+CRF for sequence labeling and the BERT model for text classification. Experiments were conducted across three nested NER datasets: GENIA, CMeEE, and GermEval 2014, featuring annotations varying from four to two levels. Before model training, we conducted separate statistical analyses on nested entities within the medical dataset CMeEE and the everyday life dataset GermEval 2014. Our research unveiled a consistent dominance of a particular entity category within nested entities across both datasets. This observation suggests the potential utility of labeling primary and subsidiary entities for effective category recognition. Model performance was evaluated based on F1 scores, considering correct recognition only when both the complete entity name and category were identified. Results showcased substantial performance enhancement after our proposed modifications compared to the original method. Additionally, our improved model exhibited strong competitiveness against existing models. F1 scores on the GENIA, CMeEE, and GermEval 2014 datasets reached 79.21, 66.71, and 87.81, respectively. Our research highlights that, while preserving the original method’s simplicity and implementation ease, our enhanced model achieves heightened performance and competitive prowess compared to other methodologies.https://www.mdpi.com/2076-3417/14/5/1717nested entitynamed entity recognitionNERsequence labelingtext classificationmerged label |
spellingShingle | Hongjian Yang Qinghao Zhang Hyuk-Chul Kwon PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition Applied Sciences nested entity named entity recognition NER sequence labeling text classification merged label |
title | PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition |
title_full | PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition |
title_fullStr | PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition |
title_full_unstemmed | PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition |
title_short | PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition |
title_sort | pner applying the pipeline method to resolve nested issues in named entity recognition |
topic | nested entity named entity recognition NER sequence labeling text classification merged label |
url | https://www.mdpi.com/2076-3417/14/5/1717 |
work_keys_str_mv | AT hongjianyang pnerapplyingthepipelinemethodtoresolvenestedissuesinnamedentityrecognition AT qinghaozhang pnerapplyingthepipelinemethodtoresolvenestedissuesinnamedentityrecognition AT hyukchulkwon pnerapplyingthepipelinemethodtoresolvenestedissuesinnamedentityrecognition |