Improving Sentence Representations via Component Focusing

The efficiency of natural language processing (NLP) tasks, such as text classification and information retrieval, can be significantly improved with proper sentence representations. Neural networks such as convolutional neural network (CNN) and recurrent neural network (RNN) are gradually applied to...

Full description

Bibliographic Details
Main Authors: Xiaoya Yin, Wu Zhang, Wenhao Zhu, Shuang Liu, Tengjun Yao
Format: Article
Language:English
Published: MDPI AG 2020-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/3/958
_version_ 1818329258387308544
author Xiaoya Yin
Wu Zhang
Wenhao Zhu
Shuang Liu
Tengjun Yao
author_facet Xiaoya Yin
Wu Zhang
Wenhao Zhu
Shuang Liu
Tengjun Yao
author_sort Xiaoya Yin
collection DOAJ
description The efficiency of natural language processing (NLP) tasks, such as text classification and information retrieval, can be significantly improved with proper sentence representations. Neural networks such as convolutional neural network (CNN) and recurrent neural network (RNN) are gradually applied to learn the representations of sentences and are suitable for processing sequences. Recently, bidirectional encoder representations from transformers (BERT) has attracted much attention because it achieves state-of-the-art performance on various NLP tasks. However, these standard models do not adequately address a general linguistic fact, that is, different sentence components serve diverse roles in the meaning of a sentence. In general, the subject, predicate, and object serve the most crucial roles as they represent the primary meaning of a sentence. Additionally, words in a sentence are also related to each other by syntactic relations. To emphasize on these issues, we propose a sentence representation model, a modification of the pre-trained bidirectional encoder representations from transformers (BERT) network via component focusing (CF-BERT). The sentence representation consists of a basic part which refers to the complete sentence, and a component-enhanced part, which focuses on subject, predicate, object, and their relations. For the best performance, a weight factor is introduced to adjust the ratio of both parts. We evaluate CF-BERT on two different tasks: semantic textual similarity and entailment classification. Results show that CF-BERT yields a significant performance gain compared to other sentence representation methods.
first_indexed 2024-12-13T12:45:12Z
format Article
id doaj.art-913e9749298d4ef380584f8ab39b7550
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-12-13T12:45:12Z
publishDate 2020-02-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-913e9749298d4ef380584f8ab39b75502022-12-21T23:45:30ZengMDPI AGApplied Sciences2076-34172020-02-0110395810.3390/app10030958app10030958Improving Sentence Representations via Component FocusingXiaoya Yin0Wu Zhang1Wenhao Zhu2Shuang Liu3Tengjun Yao4School of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaThe 36th Institute China Electronics Technology Group Corporation, Jiaxing 314000, ChinaThe efficiency of natural language processing (NLP) tasks, such as text classification and information retrieval, can be significantly improved with proper sentence representations. Neural networks such as convolutional neural network (CNN) and recurrent neural network (RNN) are gradually applied to learn the representations of sentences and are suitable for processing sequences. Recently, bidirectional encoder representations from transformers (BERT) has attracted much attention because it achieves state-of-the-art performance on various NLP tasks. However, these standard models do not adequately address a general linguistic fact, that is, different sentence components serve diverse roles in the meaning of a sentence. In general, the subject, predicate, and object serve the most crucial roles as they represent the primary meaning of a sentence. Additionally, words in a sentence are also related to each other by syntactic relations. To emphasize on these issues, we propose a sentence representation model, a modification of the pre-trained bidirectional encoder representations from transformers (BERT) network via component focusing (CF-BERT). The sentence representation consists of a basic part which refers to the complete sentence, and a component-enhanced part, which focuses on subject, predicate, object, and their relations. For the best performance, a weight factor is introduced to adjust the ratio of both parts. We evaluate CF-BERT on two different tasks: semantic textual similarity and entailment classification. Results show that CF-BERT yields a significant performance gain compared to other sentence representation methods.https://www.mdpi.com/2076-3417/10/3/958natural language processingsentence representationsentence embeddingcomponent focusingsemantic textual similarity
spellingShingle Xiaoya Yin
Wu Zhang
Wenhao Zhu
Shuang Liu
Tengjun Yao
Improving Sentence Representations via Component Focusing
Applied Sciences
natural language processing
sentence representation
sentence embedding
component focusing
semantic textual similarity
title Improving Sentence Representations via Component Focusing
title_full Improving Sentence Representations via Component Focusing
title_fullStr Improving Sentence Representations via Component Focusing
title_full_unstemmed Improving Sentence Representations via Component Focusing
title_short Improving Sentence Representations via Component Focusing
title_sort improving sentence representations via component focusing
topic natural language processing
sentence representation
sentence embedding
component focusing
semantic textual similarity
url https://www.mdpi.com/2076-3417/10/3/958
work_keys_str_mv AT xiaoyayin improvingsentencerepresentationsviacomponentfocusing
AT wuzhang improvingsentencerepresentationsviacomponentfocusing
AT wenhaozhu improvingsentencerepresentationsviacomponentfocusing
AT shuangliu improvingsentencerepresentationsviacomponentfocusing
AT tengjunyao improvingsentencerepresentationsviacomponentfocusing