AttenSy-SNER: software knowledge entity extraction with syntactic features and semantic augmentation information

Abstract Software knowledge community contains a large scale of software knowledge entity information, complex structure and rich semantic correlations. It is significant to recognize and extract software knowledge entity from software knowledge community, as it has great impact on entity-centric ta...

Full description

Bibliographic Details
Main Authors: Mingjing Tang, Tong Li, Wei Gao, Yu Xia
Format: Article
Language:English
Published: Springer 2022-06-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-022-00742-5
_version_ 1797863234591522816
author Mingjing Tang
Tong Li
Wei Gao
Yu Xia
author_facet Mingjing Tang
Tong Li
Wei Gao
Yu Xia
author_sort Mingjing Tang
collection DOAJ
description Abstract Software knowledge community contains a large scale of software knowledge entity information, complex structure and rich semantic correlations. It is significant to recognize and extract software knowledge entity from software knowledge community, as it has great impact on entity-centric tasks such as software knowledge graph construction, software document generation and expert recommendation. Since the texts of the software knowledge community are unstructured by user-generated texts, it is difficult to apply the traditional entity extraction method in the domain of the software knowledge community due to the problems of entity variation, entity sparsity, entity ambiguity, out-of-vocabulary (OOV) words and the lack of annotated data sets. This paper proposes a novel software knowledge entity extraction model, named AttenSy-SNER, which integrates syntactic features and semantic augmentation information, to extract fine-grained software knowledge entities from unstructured user-generated content. The input representation layer utilizes Bidirectional Encoder Representations from Transformers (BERT) model to extract the feature representation of the input sequence. The contextual coding layer leverages the Bidirectional Long Short-Term Memory (BiLSTM) network and Graph Convolutional Network (GCN) for contextual information and syntactic dependency information, and a semantic augmentation strategy based on attention mechanism is introduced to enrich the semantic feature representation of sequences as well. The tag decoding layer leverages Conditional Random Fields (CRF) to solve the dependency between the output tags and obtain the global optimal label sequence. The results of model comparison experiments show that the proposed model has better performance than the benchmark model in software engineering domain.
first_indexed 2024-04-09T22:32:17Z
format Article
id doaj.art-ea8c5ad1cafd44f492aacd8f32723dec
institution Directory Open Access Journal
issn 2199-4536
2198-6053
language English
last_indexed 2024-04-09T22:32:17Z
publishDate 2022-06-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj.art-ea8c5ad1cafd44f492aacd8f32723dec2023-03-22T12:43:45ZengSpringerComplex & Intelligent Systems2199-45362198-60532022-06-0191253910.1007/s40747-022-00742-5AttenSy-SNER: software knowledge entity extraction with syntactic features and semantic augmentation informationMingjing Tang0Tong Li1Wei Gao2Yu Xia3School of Life Sciences, Yunnan Normal UniversitySchool of Big Data, Yunnan Agricultural UniversitySchool of Information, Yunnan Normal UniversitySchool of Information, Yunnan Normal UniversityAbstract Software knowledge community contains a large scale of software knowledge entity information, complex structure and rich semantic correlations. It is significant to recognize and extract software knowledge entity from software knowledge community, as it has great impact on entity-centric tasks such as software knowledge graph construction, software document generation and expert recommendation. Since the texts of the software knowledge community are unstructured by user-generated texts, it is difficult to apply the traditional entity extraction method in the domain of the software knowledge community due to the problems of entity variation, entity sparsity, entity ambiguity, out-of-vocabulary (OOV) words and the lack of annotated data sets. This paper proposes a novel software knowledge entity extraction model, named AttenSy-SNER, which integrates syntactic features and semantic augmentation information, to extract fine-grained software knowledge entities from unstructured user-generated content. The input representation layer utilizes Bidirectional Encoder Representations from Transformers (BERT) model to extract the feature representation of the input sequence. The contextual coding layer leverages the Bidirectional Long Short-Term Memory (BiLSTM) network and Graph Convolutional Network (GCN) for contextual information and syntactic dependency information, and a semantic augmentation strategy based on attention mechanism is introduced to enrich the semantic feature representation of sequences as well. The tag decoding layer leverages Conditional Random Fields (CRF) to solve the dependency between the output tags and obtain the global optimal label sequence. The results of model comparison experiments show that the proposed model has better performance than the benchmark model in software engineering domain.https://doi.org/10.1007/s40747-022-00742-5Entity extractionSoftware knowledge graphAttention mechanismSyntactic dependency analysisSemantic augmentation
spellingShingle Mingjing Tang
Tong Li
Wei Gao
Yu Xia
AttenSy-SNER: software knowledge entity extraction with syntactic features and semantic augmentation information
Complex & Intelligent Systems
Entity extraction
Software knowledge graph
Attention mechanism
Syntactic dependency analysis
Semantic augmentation
title AttenSy-SNER: software knowledge entity extraction with syntactic features and semantic augmentation information
title_full AttenSy-SNER: software knowledge entity extraction with syntactic features and semantic augmentation information
title_fullStr AttenSy-SNER: software knowledge entity extraction with syntactic features and semantic augmentation information
title_full_unstemmed AttenSy-SNER: software knowledge entity extraction with syntactic features and semantic augmentation information
title_short AttenSy-SNER: software knowledge entity extraction with syntactic features and semantic augmentation information
title_sort attensy sner software knowledge entity extraction with syntactic features and semantic augmentation information
topic Entity extraction
Software knowledge graph
Attention mechanism
Syntactic dependency analysis
Semantic augmentation
url https://doi.org/10.1007/s40747-022-00742-5
work_keys_str_mv AT mingjingtang attensysnersoftwareknowledgeentityextractionwithsyntacticfeaturesandsemanticaugmentationinformation
AT tongli attensysnersoftwareknowledgeentityextractionwithsyntacticfeaturesandsemanticaugmentationinformation
AT weigao attensysnersoftwareknowledgeentityextractionwithsyntacticfeaturesandsemanticaugmentationinformation
AT yuxia attensysnersoftwareknowledgeentityextractionwithsyntacticfeaturesandsemanticaugmentationinformation