Paraphrase Identification with Lexical, Syntactic and Sentential Encodings

Paraphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents a...

Full description

Bibliographic Details
Main Authors: Sheng Xu, Xingfa Shen, Fumiyo Fukumoto, Jiyi Li, Yoshimi Suzuki, Hiromitsu Nishizaki
Format: Article
Language:English
Published: MDPI AG 2020-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/12/4144
_version_ 1797565172725841920
author Sheng Xu
Xingfa Shen
Fumiyo Fukumoto
Jiyi Li
Yoshimi Suzuki
Hiromitsu Nishizaki
author_facet Sheng Xu
Xingfa Shen
Fumiyo Fukumoto
Jiyi Li
Yoshimi Suzuki
Hiromitsu Nishizaki
author_sort Sheng Xu
collection DOAJ
description Paraphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents an approach for leveraging contextual features with a neural-based learning model. Our Lexical, Syntactic, and Sentential Encodings (LSSE) learning model incorporates Relational Graph Convolutional Networks (R-GCNs) to make use of different features from local contexts, i.e., word encoding, position encoding, and full dependency structures. By utilizing the hidden states obtained by the R-GCNs as well as lexical and sentential encodings by Bidirectional Encoder Representations from Transformers (BERT), our model learns the contextual similarity between sentences effectively. The experimental results by using the two benchmark datasets, Microsoft Research Paraphrase Corpus (MRPC) and Quora Question Pairs (QQP) show that the improvement compared with the baseline, BERT sentential encodings model, was 1.7% F1-score on MRPC and 1.0% F1-score on QQP. Moreover, we verified that the combination of position encoding and syntactic features contributes to performance improvement.
first_indexed 2024-03-10T19:08:12Z
format Article
id doaj.art-7f0756872d044e4f995e83aacfe1a254
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T19:08:12Z
publishDate 2020-06-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-7f0756872d044e4f995e83aacfe1a2542023-11-20T04:01:54ZengMDPI AGApplied Sciences2076-34172020-06-011012414410.3390/app10124144Paraphrase Identification with Lexical, Syntactic and Sentential EncodingsSheng Xu0Xingfa Shen1Fumiyo Fukumoto2Jiyi Li3Yoshimi Suzuki4Hiromitsu Nishizaki5School of Computer Science and Technology, Hangzhou Dianzi University, HangZhou 310018, ChinaSchool of Computer Science and Technology, Hangzhou Dianzi University, HangZhou 310018, ChinaGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanParaphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents an approach for leveraging contextual features with a neural-based learning model. Our Lexical, Syntactic, and Sentential Encodings (LSSE) learning model incorporates Relational Graph Convolutional Networks (R-GCNs) to make use of different features from local contexts, i.e., word encoding, position encoding, and full dependency structures. By utilizing the hidden states obtained by the R-GCNs as well as lexical and sentential encodings by Bidirectional Encoder Representations from Transformers (BERT), our model learns the contextual similarity between sentences effectively. The experimental results by using the two benchmark datasets, Microsoft Research Paraphrase Corpus (MRPC) and Quora Question Pairs (QQP) show that the improvement compared with the baseline, BERT sentential encodings model, was 1.7% F1-score on MRPC and 1.0% F1-score on QQP. Moreover, we verified that the combination of position encoding and syntactic features contributes to performance improvement.https://www.mdpi.com/2076-3417/10/12/4144paraphrase identificationencodingsR-GCNsBERTcontextual features
spellingShingle Sheng Xu
Xingfa Shen
Fumiyo Fukumoto
Jiyi Li
Yoshimi Suzuki
Hiromitsu Nishizaki
Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
Applied Sciences
paraphrase identification
encodings
R-GCNs
BERT
contextual features
title Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
title_full Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
title_fullStr Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
title_full_unstemmed Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
title_short Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
title_sort paraphrase identification with lexical syntactic and sentential encodings
topic paraphrase identification
encodings
R-GCNs
BERT
contextual features
url https://www.mdpi.com/2076-3417/10/12/4144
work_keys_str_mv AT shengxu paraphraseidentificationwithlexicalsyntacticandsententialencodings
AT xingfashen paraphraseidentificationwithlexicalsyntacticandsententialencodings
AT fumiyofukumoto paraphraseidentificationwithlexicalsyntacticandsententialencodings
AT jiyili paraphraseidentificationwithlexicalsyntacticandsententialencodings
AT yoshimisuzuki paraphraseidentificationwithlexicalsyntacticandsententialencodings
AT hiromitsunishizaki paraphraseidentificationwithlexicalsyntacticandsententialencodings