Reassembling Fragmented Entity Names: A Novel Model for Chinese Compound Noun Processing

In the process of classifying intelligent assets, we encountered challenges with a limited dataset dominated by complex compound noun phrases. Training classifiers directly on this dataset posed risks of overfitting and potential misinterpretations due to inherent ambiguities in these phrases. Recog...

Full description

Bibliographic Details
Main Authors: Yuze Pan, Xiaofeng Fu
Format: Article
Language:English
Published: MDPI AG 2023-10-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/20/4251
_version_ 1797574061311655936
author Yuze Pan
Xiaofeng Fu
author_facet Yuze Pan
Xiaofeng Fu
author_sort Yuze Pan
collection DOAJ
description In the process of classifying intelligent assets, we encountered challenges with a limited dataset dominated by complex compound noun phrases. Training classifiers directly on this dataset posed risks of overfitting and potential misinterpretations due to inherent ambiguities in these phrases. Recognizing the gap in the current literature for tailored methods addressing this challenge, this paper introduces a refined approach for the accurate extraction of entity names from such structures. We leveraged the Chinese pre-trained BERT model combined with an attention mechanism, ensuring precise interpretation of each token’s significance. This was followed by employing both a multi-layer perceptron (MLP) and an LSTM-based Sequence Parsing Model, tailored for sequence annotation and rule-based parsing. With the aid of a rule-driven decoder, we reconstructed comprehensive entity names. Our approach adeptly extracts structurally coherent entity names from fragmented compound noun phrases. Experiments on a manually annotated dataset of compound noun phrases demonstrate that our model consistently outperforms rival methodologies. These results compellingly validate our method’s superiority in extracting entity names from compound noun phrases.
first_indexed 2024-03-10T21:17:54Z
format Article
id doaj.art-bbe31558defe4663ab398fc2b8d81fcd
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T21:17:54Z
publishDate 2023-10-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-bbe31558defe4663ab398fc2b8d81fcd2023-11-19T16:18:59ZengMDPI AGElectronics2079-92922023-10-011220425110.3390/electronics12204251Reassembling Fragmented Entity Names: A Novel Model for Chinese Compound Noun ProcessingYuze Pan0Xiaofeng Fu1School of Automation, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaIn the process of classifying intelligent assets, we encountered challenges with a limited dataset dominated by complex compound noun phrases. Training classifiers directly on this dataset posed risks of overfitting and potential misinterpretations due to inherent ambiguities in these phrases. Recognizing the gap in the current literature for tailored methods addressing this challenge, this paper introduces a refined approach for the accurate extraction of entity names from such structures. We leveraged the Chinese pre-trained BERT model combined with an attention mechanism, ensuring precise interpretation of each token’s significance. This was followed by employing both a multi-layer perceptron (MLP) and an LSTM-based Sequence Parsing Model, tailored for sequence annotation and rule-based parsing. With the aid of a rule-driven decoder, we reconstructed comprehensive entity names. Our approach adeptly extracts structurally coherent entity names from fragmented compound noun phrases. Experiments on a manually annotated dataset of compound noun phrases demonstrate that our model consistently outperforms rival methodologies. These results compellingly validate our method’s superiority in extracting entity names from compound noun phrases.https://www.mdpi.com/2079-9292/12/20/4251compound noun phrasesentity name extractionfragmentationsequence labelingsequence parsing
spellingShingle Yuze Pan
Xiaofeng Fu
Reassembling Fragmented Entity Names: A Novel Model for Chinese Compound Noun Processing
Electronics
compound noun phrases
entity name extraction
fragmentation
sequence labeling
sequence parsing
title Reassembling Fragmented Entity Names: A Novel Model for Chinese Compound Noun Processing
title_full Reassembling Fragmented Entity Names: A Novel Model for Chinese Compound Noun Processing
title_fullStr Reassembling Fragmented Entity Names: A Novel Model for Chinese Compound Noun Processing
title_full_unstemmed Reassembling Fragmented Entity Names: A Novel Model for Chinese Compound Noun Processing
title_short Reassembling Fragmented Entity Names: A Novel Model for Chinese Compound Noun Processing
title_sort reassembling fragmented entity names a novel model for chinese compound noun processing
topic compound noun phrases
entity name extraction
fragmentation
sequence labeling
sequence parsing
url https://www.mdpi.com/2079-9292/12/20/4251
work_keys_str_mv AT yuzepan reassemblingfragmentedentitynamesanovelmodelforchinesecompoundnounprocessing
AT xiaofengfu reassemblingfragmentedentitynamesanovelmodelforchinesecompoundnounprocessing