Using machine learning for recognition of text patterns of literary sources

Background. Today, in the field of artificial intelligence, there are natural language processing technologies, the purpose of which is to solve problems in such areas as machine translation, text sentiment analysis and text classification. In the article, within the framework of the problem of r...

Full description

Bibliographic Details
Main Authors: V.S. Tomashevskaya, Yu.V. Starichkova, D.A. Yakovlev
Format: Article
Language:English
Published: Penza State University Publishing House 2022-12-01
Series:Известия высших учебных заведений. Поволжский регион:Технические науки
Subjects:
_version_ 1797975813238292480
author V.S. Tomashevskaya
Yu.V. Starichkova
D.A. Yakovlev
author_facet V.S. Tomashevskaya
Yu.V. Starichkova
D.A. Yakovlev
author_sort V.S. Tomashevskaya
collection DOAJ
description Background. Today, in the field of artificial intelligence, there are natural language processing technologies, the purpose of which is to solve problems in such areas as machine translation, text sentiment analysis and text classification. In the article, within the framework of the problem of recognition of text patterns, the application of machine learning and data mining methods is considered. The object of the study is the types of literary sources. The subject of the research is the classification of literary sources using machine learning methods. The purpose of the work is to compare the effectiveness of machine learning methods in solving the problem of binary classification of literary sources and to identify the distinctive features inherent in each of them. Materials and methods. Classification of literary sources using the Naive Bayes classifier and Logistic regression, and the Bag of Words and TF-IDF methods. Results. A comparative analysis of the obtained models was carried out. The model with which the Logistic regression and the Bag of Words method were used together demonstrates the greatest efficiency. Conclusions. Logistic regression and the Bag of Words method demonstrated the greatest efficiency when working with text templates, while the use of stemmization and lemmatization did not affect the final model efficiency indicator. The second type of literary sources contains text constructions unique to it, such as “[Electronic resource]” or “date of access”, which increase the chance of correct classification.
first_indexed 2024-04-11T04:41:24Z
format Article
id doaj.art-889ae2e061944daebade84e7d4bc99ba
institution Directory Open Access Journal
issn 2072-3059
language English
last_indexed 2024-04-11T04:41:24Z
publishDate 2022-12-01
publisher Penza State University Publishing House
record_format Article
series Известия высших учебных заведений. Поволжский регион:Технические науки
spelling doaj.art-889ae2e061944daebade84e7d4bc99ba2022-12-28T05:11:35ZengPenza State University Publishing HouseИзвестия высших учебных заведений. Поволжский регион:Технические науки2072-30592022-12-01310.21685/2072-3059-2022-3-2Using machine learning for recognition of text patterns of literary sourcesV.S. Tomashevskaya0Yu.V. Starichkova1D.A. Yakovlev2MIREA – Russian Technological UniversityMIREA – Russian Technological UniversityMIREA – Russian Technological UniversityBackground. Today, in the field of artificial intelligence, there are natural language processing technologies, the purpose of which is to solve problems in such areas as machine translation, text sentiment analysis and text classification. In the article, within the framework of the problem of recognition of text patterns, the application of machine learning and data mining methods is considered. The object of the study is the types of literary sources. The subject of the research is the classification of literary sources using machine learning methods. The purpose of the work is to compare the effectiveness of machine learning methods in solving the problem of binary classification of literary sources and to identify the distinctive features inherent in each of them. Materials and methods. Classification of literary sources using the Naive Bayes classifier and Logistic regression, and the Bag of Words and TF-IDF methods. Results. A comparative analysis of the obtained models was carried out. The model with which the Logistic regression and the Bag of Words method were used together demonstrates the greatest efficiency. Conclusions. Logistic regression and the Bag of Words method demonstrated the greatest efficiency when working with text templates, while the use of stemmization and lemmatization did not affect the final model efficiency indicator. The second type of literary sources contains text constructions unique to it, such as “[Electronic resource]” or “date of access”, which increase the chance of correct classification.natural language processingmachine learningnaive bayes classifierlogistic regression
spellingShingle V.S. Tomashevskaya
Yu.V. Starichkova
D.A. Yakovlev
Using machine learning for recognition of text patterns of literary sources
Известия высших учебных заведений. Поволжский регион:Технические науки
natural language processing
machine learning
naive bayes classifier
logistic regression
title Using machine learning for recognition of text patterns of literary sources
title_full Using machine learning for recognition of text patterns of literary sources
title_fullStr Using machine learning for recognition of text patterns of literary sources
title_full_unstemmed Using machine learning for recognition of text patterns of literary sources
title_short Using machine learning for recognition of text patterns of literary sources
title_sort using machine learning for recognition of text patterns of literary sources
topic natural language processing
machine learning
naive bayes classifier
logistic regression
work_keys_str_mv AT vstomashevskaya usingmachinelearningforrecognitionoftextpatternsofliterarysources
AT yuvstarichkova usingmachinelearningforrecognitionoftextpatternsofliterarysources
AT dayakovlev usingmachinelearningforrecognitionoftextpatternsofliterarysources