Summary: | Background. Today, in the field of artificial intelligence, there are natural language
processing technologies, the purpose of which is to solve problems in such areas as
machine translation, text sentiment analysis and text classification. In the article, within the
framework of the problem of recognition of text patterns, the application of machine learning and data mining methods is considered. The object of the study is the types of literary
sources. The subject of the research is the classification of literary sources using machine
learning methods. The purpose of the work is to compare the effectiveness of machine
learning methods in solving the problem of binary classification of literary sources and to
identify the distinctive features inherent in each of them. Materials and methods. Classification
of literary sources using the Naive Bayes classifier and Logistic regression, and the
Bag of Words and TF-IDF methods. Results. A comparative analysis of the obtained models
was carried out. The model with which the Logistic regression and the Bag of Words
method were used together demonstrates the greatest efficiency. Conclusions. Logistic regression
and the Bag of Words method demonstrated the greatest efficiency when working
with text templates, while the use of stemmization and lemmatization did not affect the
final model efficiency indicator. The second type of literary sources contains text constructions
unique to it, such as “[Electronic resource]” or “date of access”, which increase the
chance of correct classification.
|