Using machine learning for recognition of text patterns of literary sources
Background. Today, in the field of artificial intelligence, there are natural language processing technologies, the purpose of which is to solve problems in such areas as machine translation, text sentiment analysis and text classification. In the article, within the framework of the problem of r...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Penza State University Publishing House
2022-12-01
|
Series: | Известия высших учебных заведений. Поволжский регион:Технические науки |
Subjects: |
_version_ | 1797975813238292480 |
---|---|
author | V.S. Tomashevskaya Yu.V. Starichkova D.A. Yakovlev |
author_facet | V.S. Tomashevskaya Yu.V. Starichkova D.A. Yakovlev |
author_sort | V.S. Tomashevskaya |
collection | DOAJ |
description | Background. Today, in the field of artificial intelligence, there are natural language
processing technologies, the purpose of which is to solve problems in such areas as
machine translation, text sentiment analysis and text classification. In the article, within the
framework of the problem of recognition of text patterns, the application of machine learning and data mining methods is considered. The object of the study is the types of literary
sources. The subject of the research is the classification of literary sources using machine
learning methods. The purpose of the work is to compare the effectiveness of machine
learning methods in solving the problem of binary classification of literary sources and to
identify the distinctive features inherent in each of them. Materials and methods. Classification
of literary sources using the Naive Bayes classifier and Logistic regression, and the
Bag of Words and TF-IDF methods. Results. A comparative analysis of the obtained models
was carried out. The model with which the Logistic regression and the Bag of Words
method were used together demonstrates the greatest efficiency. Conclusions. Logistic regression
and the Bag of Words method demonstrated the greatest efficiency when working
with text templates, while the use of stemmization and lemmatization did not affect the
final model efficiency indicator. The second type of literary sources contains text constructions
unique to it, such as “[Electronic resource]” or “date of access”, which increase the
chance of correct classification. |
first_indexed | 2024-04-11T04:41:24Z |
format | Article |
id | doaj.art-889ae2e061944daebade84e7d4bc99ba |
institution | Directory Open Access Journal |
issn | 2072-3059 |
language | English |
last_indexed | 2024-04-11T04:41:24Z |
publishDate | 2022-12-01 |
publisher | Penza State University Publishing House |
record_format | Article |
series | Известия высших учебных заведений. Поволжский регион:Технические науки |
spelling | doaj.art-889ae2e061944daebade84e7d4bc99ba2022-12-28T05:11:35ZengPenza State University Publishing HouseИзвестия высших учебных заведений. Поволжский регион:Технические науки2072-30592022-12-01310.21685/2072-3059-2022-3-2Using machine learning for recognition of text patterns of literary sourcesV.S. Tomashevskaya0Yu.V. Starichkova1D.A. Yakovlev2MIREA – Russian Technological UniversityMIREA – Russian Technological UniversityMIREA – Russian Technological UniversityBackground. Today, in the field of artificial intelligence, there are natural language processing technologies, the purpose of which is to solve problems in such areas as machine translation, text sentiment analysis and text classification. In the article, within the framework of the problem of recognition of text patterns, the application of machine learning and data mining methods is considered. The object of the study is the types of literary sources. The subject of the research is the classification of literary sources using machine learning methods. The purpose of the work is to compare the effectiveness of machine learning methods in solving the problem of binary classification of literary sources and to identify the distinctive features inherent in each of them. Materials and methods. Classification of literary sources using the Naive Bayes classifier and Logistic regression, and the Bag of Words and TF-IDF methods. Results. A comparative analysis of the obtained models was carried out. The model with which the Logistic regression and the Bag of Words method were used together demonstrates the greatest efficiency. Conclusions. Logistic regression and the Bag of Words method demonstrated the greatest efficiency when working with text templates, while the use of stemmization and lemmatization did not affect the final model efficiency indicator. The second type of literary sources contains text constructions unique to it, such as “[Electronic resource]” or “date of access”, which increase the chance of correct classification.natural language processingmachine learningnaive bayes classifierlogistic regression |
spellingShingle | V.S. Tomashevskaya Yu.V. Starichkova D.A. Yakovlev Using machine learning for recognition of text patterns of literary sources Известия высших учебных заведений. Поволжский регион:Технические науки natural language processing machine learning naive bayes classifier logistic regression |
title | Using machine learning for recognition of text patterns of literary sources |
title_full | Using machine learning for recognition of text patterns of literary sources |
title_fullStr | Using machine learning for recognition of text patterns of literary sources |
title_full_unstemmed | Using machine learning for recognition of text patterns of literary sources |
title_short | Using machine learning for recognition of text patterns of literary sources |
title_sort | using machine learning for recognition of text patterns of literary sources |
topic | natural language processing machine learning naive bayes classifier logistic regression |
work_keys_str_mv | AT vstomashevskaya usingmachinelearningforrecognitionoftextpatternsofliterarysources AT yuvstarichkova usingmachinelearningforrecognitionoftextpatternsofliterarysources AT dayakovlev usingmachinelearningforrecognitionoftextpatternsofliterarysources |