RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS
The possibilities of determining the semantic similarity of multilingual arbitrary-length text content have been investigated using their vector representations obtained within different multilingual models based on Transformer architecture. A comparative analysis of the Transformers has been perfor...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
National Technical University "Kharkiv Polytechnic Institute"
2020-10-01
|
Series: | Сучасні інформаційні системи |
Subjects: | |
Online Access: | http://ais.khpi.edu.ua/article/view/213331 |
_version_ | 1818600267498651648 |
---|---|
author | Serhii Olizarenko Vladimir Argunov |
author_facet | Serhii Olizarenko Vladimir Argunov |
author_sort | Serhii Olizarenko |
collection | DOAJ |
description | The possibilities of determining the semantic similarity of multilingual arbitrary-length text content have been investigated using their vector representations obtained within different multilingual models based on Transformer architecture. A comparative analysis of the Transformers has been performed to select the most advantageous model for this class of problems. Also, two new unique approaches to determining the semantic similarity of a multilingual text content have been developed to be used in the HIPSTO Open AI Information Discovery Platform, the challenge being to allow arbitrary text length. Experimental and research evidence is offered to support the new approaches as a solution to the semantic similarity problem. |
first_indexed | 2024-12-16T12:32:46Z |
format | Article |
id | doaj.art-12c7951053b64867b61afbeab3d59aa7 |
institution | Directory Open Access Journal |
issn | 2522-9052 |
language | English |
last_indexed | 2024-12-16T12:32:46Z |
publishDate | 2020-10-01 |
publisher | National Technical University "Kharkiv Polytechnic Institute" |
record_format | Article |
series | Сучасні інформаційні системи |
spelling | doaj.art-12c7951053b64867b61afbeab3d59aa72022-12-21T22:31:39ZengNational Technical University "Kharkiv Polytechnic Institute"Сучасні інформаційні системи2522-90522020-10-014310.20998/2522-9052.2020.3.13RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELSSerhii Olizarenko0Vladimir Argunov1Kharkіv National University of Radio Electronics University, KharkivHIPSTO, KharkivThe possibilities of determining the semantic similarity of multilingual arbitrary-length text content have been investigated using their vector representations obtained within different multilingual models based on Transformer architecture. A comparative analysis of the Transformers has been performed to select the most advantageous model for this class of problems. Also, two new unique approaches to determining the semantic similarity of a multilingual text content have been developed to be used in the HIPSTO Open AI Information Discovery Platform, the challenge being to allow arbitrary text length. Experimental and research evidence is offered to support the new approaches as a solution to the semantic similarity problem.http://ais.khpi.edu.ua/article/view/213331Natural Language ProcessingBERTsemantic similaritiesnews contentDeep Learningmultilingual text content |
spellingShingle | Serhii Olizarenko Vladimir Argunov RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS Сучасні інформаційні системи Natural Language Processing BERT semantic similarities news content Deep Learning multilingual text content |
title | RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS |
title_full | RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS |
title_fullStr | RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS |
title_full_unstemmed | RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS |
title_short | RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS |
title_sort | research on the specific features of determining the semantic similarity of arbitrary length text content using multilingual transformer based models |
topic | Natural Language Processing BERT semantic similarities news content Deep Learning multilingual text content |
url | http://ais.khpi.edu.ua/article/view/213331 |
work_keys_str_mv | AT serhiiolizarenko researchonthespecificfeaturesofdeterminingthesemanticsimilarityofarbitrarylengthtextcontentusingmultilingualtransformerbasedmodels AT vladimirargunov researchonthespecificfeaturesofdeterminingthesemanticsimilarityofarbitrarylengthtextcontentusingmultilingualtransformerbasedmodels |