RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS

The possibilities of determining the semantic similarity of multilingual arbitrary-length text content have been investigated using their vector representations obtained within different multilingual models based on Transformer architecture. A comparative analysis of the Transformers has been perfor...

Full description

Bibliographic Details
Main Authors: Serhii Olizarenko, Vladimir Argunov
Format: Article
Language:English
Published: National Technical University "Kharkiv Polytechnic Institute" 2020-10-01
Series:Сучасні інформаційні системи
Subjects:
Online Access:http://ais.khpi.edu.ua/article/view/213331
_version_ 1818600267498651648
author Serhii Olizarenko
Vladimir Argunov
author_facet Serhii Olizarenko
Vladimir Argunov
author_sort Serhii Olizarenko
collection DOAJ
description The possibilities of determining the semantic similarity of multilingual arbitrary-length text content have been investigated using their vector representations obtained within different multilingual models based on Transformer architecture. A comparative analysis of the Transformers has been performed to select the most advantageous model for this class of problems. Also, two new unique approaches to determining the semantic similarity of a multilingual text content have been developed to be used in the HIPSTO Open AI Information Discovery Platform, the challenge being to allow arbitrary text length. Experimental and research evidence is offered to support the new approaches as a solution to the semantic similarity problem.
first_indexed 2024-12-16T12:32:46Z
format Article
id doaj.art-12c7951053b64867b61afbeab3d59aa7
institution Directory Open Access Journal
issn 2522-9052
language English
last_indexed 2024-12-16T12:32:46Z
publishDate 2020-10-01
publisher National Technical University "Kharkiv Polytechnic Institute"
record_format Article
series Сучасні інформаційні системи
spelling doaj.art-12c7951053b64867b61afbeab3d59aa72022-12-21T22:31:39ZengNational Technical University "Kharkiv Polytechnic Institute"Сучасні інформаційні системи2522-90522020-10-014310.20998/2522-9052.2020.3.13RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELSSerhii Olizarenko0Vladimir Argunov1Kharkіv National University of Radio Electronics University, KharkivHIPSTO, KharkivThe possibilities of determining the semantic similarity of multilingual arbitrary-length text content have been investigated using their vector representations obtained within different multilingual models based on Transformer architecture. A comparative analysis of the Transformers has been performed to select the most advantageous model for this class of problems. Also, two new unique approaches to determining the semantic similarity of a multilingual text content have been developed to be used in the HIPSTO Open AI Information Discovery Platform, the challenge being to allow arbitrary text length. Experimental and research evidence is offered to support the new approaches as a solution to the semantic similarity problem.http://ais.khpi.edu.ua/article/view/213331Natural Language ProcessingBERTsemantic similaritiesnews contentDeep Learningmultilingual text content
spellingShingle Serhii Olizarenko
Vladimir Argunov
RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS
Сучасні інформаційні системи
Natural Language Processing
BERT
semantic similarities
news content
Deep Learning
multilingual text content
title RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS
title_full RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS
title_fullStr RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS
title_full_unstemmed RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS
title_short RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS
title_sort research on the specific features of determining the semantic similarity of arbitrary length text content using multilingual transformer based models
topic Natural Language Processing
BERT
semantic similarities
news content
Deep Learning
multilingual text content
url http://ais.khpi.edu.ua/article/view/213331
work_keys_str_mv AT serhiiolizarenko researchonthespecificfeaturesofdeterminingthesemanticsimilarityofarbitrarylengthtextcontentusingmultilingualtransformerbasedmodels
AT vladimirargunov researchonthespecificfeaturesofdeterminingthesemanticsimilarityofarbitrarylengthtextcontentusingmultilingualtransformerbasedmodels