Paraphrase Identification with Lexical, Syntactic and Sentential Encodings

Paraphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents a...

Full description

Bibliographic Details
Main Authors:	Sheng Xu, Xingfa Shen, Fumiyo Fukumoto, Jiyi Li, Yoshimi Suzuki, Hiromitsu Nishizaki
Format:	Article
Language:	English
Published:	MDPI AG 2020-06-01
Series:	Applied Sciences
Subjects:	paraphrase identification encodings R-GCNs BERT contextual features
Online Access:	https://www.mdpi.com/2076-3417/10/12/4144

_version_	1797565172725841920
author	Sheng Xu Xingfa Shen Fumiyo Fukumoto Jiyi Li Yoshimi Suzuki Hiromitsu Nishizaki
author_facet	Sheng Xu Xingfa Shen Fumiyo Fukumoto Jiyi Li Yoshimi Suzuki Hiromitsu Nishizaki
author_sort	Sheng Xu
collection	DOAJ
description	Paraphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents an approach for leveraging contextual features with a neural-based learning model. Our Lexical, Syntactic, and Sentential Encodings (LSSE) learning model incorporates Relational Graph Convolutional Networks (R-GCNs) to make use of different features from local contexts, i.e., word encoding, position encoding, and full dependency structures. By utilizing the hidden states obtained by the R-GCNs as well as lexical and sentential encodings by Bidirectional Encoder Representations from Transformers (BERT), our model learns the contextual similarity between sentences effectively. The experimental results by using the two benchmark datasets, Microsoft Research Paraphrase Corpus (MRPC) and Quora Question Pairs (QQP) show that the improvement compared with the baseline, BERT sentential encodings model, was 1.7% F1-score on MRPC and 1.0% F1-score on QQP. Moreover, we verified that the combination of position encoding and syntactic features contributes to performance improvement.
first_indexed	2024-03-10T19:08:12Z
format	Article
id	doaj.art-7f0756872d044e4f995e83aacfe1a254
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T19:08:12Z
publishDate	2020-06-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-7f0756872d044e4f995e83aacfe1a2542023-11-20T04:01:54ZengMDPI AGApplied Sciences2076-34172020-06-011012414410.3390/app10124144Paraphrase Identification with Lexical, Syntactic and Sentential EncodingsSheng Xu0Xingfa Shen1Fumiyo Fukumoto2Jiyi Li3Yoshimi Suzuki4Hiromitsu Nishizaki5School of Computer Science and Technology, Hangzhou Dianzi University, HangZhou 310018, ChinaSchool of Computer Science and Technology, Hangzhou Dianzi University, HangZhou 310018, ChinaGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanParaphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents an approach for leveraging contextual features with a neural-based learning model. Our Lexical, Syntactic, and Sentential Encodings (LSSE) learning model incorporates Relational Graph Convolutional Networks (R-GCNs) to make use of different features from local contexts, i.e., word encoding, position encoding, and full dependency structures. By utilizing the hidden states obtained by the R-GCNs as well as lexical and sentential encodings by Bidirectional Encoder Representations from Transformers (BERT), our model learns the contextual similarity between sentences effectively. The experimental results by using the two benchmark datasets, Microsoft Research Paraphrase Corpus (MRPC) and Quora Question Pairs (QQP) show that the improvement compared with the baseline, BERT sentential encodings model, was 1.7% F1-score on MRPC and 1.0% F1-score on QQP. Moreover, we verified that the combination of position encoding and syntactic features contributes to performance improvement.https://www.mdpi.com/2076-3417/10/12/4144paraphrase identificationencodingsR-GCNsBERTcontextual features
spellingShingle	Sheng Xu Xingfa Shen Fumiyo Fukumoto Jiyi Li Yoshimi Suzuki Hiromitsu Nishizaki Paraphrase Identification with Lexical, Syntactic and Sentential Encodings Applied Sciences paraphrase identification encodings R-GCNs BERT contextual features
title	Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
title_full	Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
title_fullStr	Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
title_full_unstemmed	Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
title_short	Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
title_sort	paraphrase identification with lexical syntactic and sentential encodings
topic	paraphrase identification encodings R-GCNs BERT contextual features
url	https://www.mdpi.com/2076-3417/10/12/4144
work_keys_str_mv	AT shengxu paraphraseidentificationwithlexicalsyntacticandsententialencodings AT xingfashen paraphraseidentificationwithlexicalsyntacticandsententialencodings AT fumiyofukumoto paraphraseidentificationwithlexicalsyntacticandsententialencodings AT jiyili paraphraseidentificationwithlexicalsyntacticandsententialencodings AT yoshimisuzuki paraphraseidentificationwithlexicalsyntacticandsententialencodings AT hiromitsunishizaki paraphraseidentificationwithlexicalsyntacticandsententialencodings

Paraphrase Identification with Lexical, Syntactic and Sentential Encodings

Similar Items