Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
Paraphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents a...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-06-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/10/12/4144 |
_version_ | 1797565172725841920 |
---|---|
author | Sheng Xu Xingfa Shen Fumiyo Fukumoto Jiyi Li Yoshimi Suzuki Hiromitsu Nishizaki |
author_facet | Sheng Xu Xingfa Shen Fumiyo Fukumoto Jiyi Li Yoshimi Suzuki Hiromitsu Nishizaki |
author_sort | Sheng Xu |
collection | DOAJ |
description | Paraphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents an approach for leveraging contextual features with a neural-based learning model. Our Lexical, Syntactic, and Sentential Encodings (LSSE) learning model incorporates Relational Graph Convolutional Networks (R-GCNs) to make use of different features from local contexts, i.e., word encoding, position encoding, and full dependency structures. By utilizing the hidden states obtained by the R-GCNs as well as lexical and sentential encodings by Bidirectional Encoder Representations from Transformers (BERT), our model learns the contextual similarity between sentences effectively. The experimental results by using the two benchmark datasets, Microsoft Research Paraphrase Corpus (MRPC) and Quora Question Pairs (QQP) show that the improvement compared with the baseline, BERT sentential encodings model, was 1.7% F1-score on MRPC and 1.0% F1-score on QQP. Moreover, we verified that the combination of position encoding and syntactic features contributes to performance improvement. |
first_indexed | 2024-03-10T19:08:12Z |
format | Article |
id | doaj.art-7f0756872d044e4f995e83aacfe1a254 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T19:08:12Z |
publishDate | 2020-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-7f0756872d044e4f995e83aacfe1a2542023-11-20T04:01:54ZengMDPI AGApplied Sciences2076-34172020-06-011012414410.3390/app10124144Paraphrase Identification with Lexical, Syntactic and Sentential EncodingsSheng Xu0Xingfa Shen1Fumiyo Fukumoto2Jiyi Li3Yoshimi Suzuki4Hiromitsu Nishizaki5School of Computer Science and Technology, Hangzhou Dianzi University, HangZhou 310018, ChinaSchool of Computer Science and Technology, Hangzhou Dianzi University, HangZhou 310018, ChinaGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, JapanParaphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents an approach for leveraging contextual features with a neural-based learning model. Our Lexical, Syntactic, and Sentential Encodings (LSSE) learning model incorporates Relational Graph Convolutional Networks (R-GCNs) to make use of different features from local contexts, i.e., word encoding, position encoding, and full dependency structures. By utilizing the hidden states obtained by the R-GCNs as well as lexical and sentential encodings by Bidirectional Encoder Representations from Transformers (BERT), our model learns the contextual similarity between sentences effectively. The experimental results by using the two benchmark datasets, Microsoft Research Paraphrase Corpus (MRPC) and Quora Question Pairs (QQP) show that the improvement compared with the baseline, BERT sentential encodings model, was 1.7% F1-score on MRPC and 1.0% F1-score on QQP. Moreover, we verified that the combination of position encoding and syntactic features contributes to performance improvement.https://www.mdpi.com/2076-3417/10/12/4144paraphrase identificationencodingsR-GCNsBERTcontextual features |
spellingShingle | Sheng Xu Xingfa Shen Fumiyo Fukumoto Jiyi Li Yoshimi Suzuki Hiromitsu Nishizaki Paraphrase Identification with Lexical, Syntactic and Sentential Encodings Applied Sciences paraphrase identification encodings R-GCNs BERT contextual features |
title | Paraphrase Identification with Lexical, Syntactic and Sentential Encodings |
title_full | Paraphrase Identification with Lexical, Syntactic and Sentential Encodings |
title_fullStr | Paraphrase Identification with Lexical, Syntactic and Sentential Encodings |
title_full_unstemmed | Paraphrase Identification with Lexical, Syntactic and Sentential Encodings |
title_short | Paraphrase Identification with Lexical, Syntactic and Sentential Encodings |
title_sort | paraphrase identification with lexical syntactic and sentential encodings |
topic | paraphrase identification encodings R-GCNs BERT contextual features |
url | https://www.mdpi.com/2076-3417/10/12/4144 |
work_keys_str_mv | AT shengxu paraphraseidentificationwithlexicalsyntacticandsententialencodings AT xingfashen paraphraseidentificationwithlexicalsyntacticandsententialencodings AT fumiyofukumoto paraphraseidentificationwithlexicalsyntacticandsententialencodings AT jiyili paraphraseidentificationwithlexicalsyntacticandsententialencodings AT yoshimisuzuki paraphraseidentificationwithlexicalsyntacticandsententialencodings AT hiromitsunishizaki paraphraseidentificationwithlexicalsyntacticandsententialencodings |