Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese

Two sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage...

Full description

Bibliographic Details
Main Authors: Pedro Fialho, Luísa Coheur, Paulo Quaresma
Format: Article
Language:English
Published: MDPI AG 2020-10-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/11/10/484
_version_ 1797550902291202048
author Pedro Fialho
Luísa Coheur
Paulo Quaresma
author_facet Pedro Fialho
Luísa Coheur
Paulo Quaresma
author_sort Pedro Fialho
collection DOAJ
description Two sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage of pre-trained models (BERT); additionally, we studied the roles of lexical features. We tested our models in several datasets—ASSIN, SICK-BR and ASSIN2—and the best results were usually achieved with ptBERT-Large, trained in a Brazilian corpus and tuned in the latter datasets. Besides obtaining state-of-the-art results, this is, to the best of our knowledge, the most all-inclusive study about natural language inference and semantic textual similarity for the Portuguese language.
first_indexed 2024-03-10T15:36:59Z
format Article
id doaj.art-f2824336509d43fd8dd5ddd6bdc8fdfc
institution Directory Open Access Journal
issn 2078-2489
language English
last_indexed 2024-03-10T15:36:59Z
publishDate 2020-10-01
publisher MDPI AG
record_format Article
series Information
spelling doaj.art-f2824336509d43fd8dd5ddd6bdc8fdfc2023-11-20T17:13:46ZengMDPI AGInformation2078-24892020-10-01111048410.3390/info11100484Benchmarking Natural Language Inference and Semantic Textual Similarity for PortuguesePedro Fialho0Luísa Coheur1Paulo Quaresma2INESC-ID, Rua Alves Redol 9, 1000-029 Lisboa, PortugalINESC-ID, Rua Alves Redol 9, 1000-029 Lisboa, PortugalINESC-ID, Rua Alves Redol 9, 1000-029 Lisboa, PortugalTwo sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage of pre-trained models (BERT); additionally, we studied the roles of lexical features. We tested our models in several datasets—ASSIN, SICK-BR and ASSIN2—and the best results were usually achieved with ptBERT-Large, trained in a Brazilian corpus and tuned in the latter datasets. Besides obtaining state-of-the-art results, this is, to the best of our knowledge, the most all-inclusive study about natural language inference and semantic textual similarity for the Portuguese language.https://www.mdpi.com/2078-2489/11/10/484natural language inferencesemantic textual similaritymultilingual BERTlexical features
spellingShingle Pedro Fialho
Luísa Coheur
Paulo Quaresma
Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese
Information
natural language inference
semantic textual similarity
multilingual BERT
lexical features
title Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese
title_full Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese
title_fullStr Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese
title_full_unstemmed Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese
title_short Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese
title_sort benchmarking natural language inference and semantic textual similarity for portuguese
topic natural language inference
semantic textual similarity
multilingual BERT
lexical features
url https://www.mdpi.com/2078-2489/11/10/484
work_keys_str_mv AT pedrofialho benchmarkingnaturallanguageinferenceandsemantictextualsimilarityforportuguese
AT luisacoheur benchmarkingnaturallanguageinferenceandsemantictextualsimilarityforportuguese
AT pauloquaresma benchmarkingnaturallanguageinferenceandsemantictextualsimilarityforportuguese