Automated Search and Analysis of the Stylometric Features that Describe the Style of the Prose 19th-21st Centuries

The article is devoted to comparison of stylometric features of several levels, which are markers of the style of the prose text and analysis of the stylistic changes in Russian and British prose of the 19th-21st centuries. Stylometric features include the low-level features based on the words and s...

Full description

Bibliographic Details
Main Authors: Ksenia V. Lagutina, Alla M. Manakhova
Format: Article
Language:English
Published: Yaroslavl State University 2020-09-01
Series:Моделирование и анализ информационных систем
Subjects:
Online Access:https://www.mais-journal.ru/jour/article/view/1352
_version_ 1797877811744079872
author Ksenia V. Lagutina
Alla M. Manakhova
author_facet Ksenia V. Lagutina
Alla M. Manakhova
author_sort Ksenia V. Lagutina
collection DOAJ
description The article is devoted to comparison of stylometric features of several levels, which are markers of the style of the prose text and analysis of the stylistic changes in Russian and British prose of the 19th-21st centuries. Stylometric features include the low-level features based on the words and symbols and high-level based on rhythmic. These features model the style of a text and are the indicators of the time when the text was created.Calculations of all the features are performed completely automatically, so it allows to conduct the large-scale experiments with artworks of a large volume and speeds up the work of a linguist. To calculate the stylometric features including ones based on the search results for rhythmic figures the ProseRhythmDetector program is used. As a result of its work, each text is presented as a set of the same features of three levels: characters, words, rhythm. Texts are combined by decades, for each decade there are found average values of stylometric features. The obtained models of decades are compared using standard similarity metrics, results of comparison are visualized in the form of the heat maps and dendrograms. Experiments with two corpora of Russian and British texts show that during the 19th-21st centuries there are general trends in style change for both corpora, for example, a decrease in the number of rhythmic figures per sentence, and also particular trends for each language, for example, dynamics of change of the word and sentence lengths. Stylometric features of all levels reveal the similarity in the style of texts published in one century. Also, features of three levels in the complex better demonstrate the uniqueness of each decade than features of a particular level. This study shows the importance of stylometric features as style markers of the different eras and allows us to identify trends in style during several centuries.
first_indexed 2024-04-10T02:24:07Z
format Article
id doaj.art-786a1d6d476a40888696373601f5a95f
institution Directory Open Access Journal
issn 1818-1015
2313-5417
language English
last_indexed 2024-04-10T02:24:07Z
publishDate 2020-09-01
publisher Yaroslavl State University
record_format Article
series Моделирование и анализ информационных систем
spelling doaj.art-786a1d6d476a40888696373601f5a95f2023-03-13T08:07:35ZengYaroslavl State UniversityМоделирование и анализ информационных систем1818-10152313-54172020-09-0127333034310.18255/1818-1015-2020-3-330-3431011Automated Search and Analysis of the Stylometric Features that Describe the Style of the Prose 19th-21st CenturiesKsenia V. Lagutina0Alla M. Manakhova1Ярославский государственный университет им. П.Г. ДемидоваЯрославский государственный университет им. П.Г. ДемидоваThe article is devoted to comparison of stylometric features of several levels, which are markers of the style of the prose text and analysis of the stylistic changes in Russian and British prose of the 19th-21st centuries. Stylometric features include the low-level features based on the words and symbols and high-level based on rhythmic. These features model the style of a text and are the indicators of the time when the text was created.Calculations of all the features are performed completely automatically, so it allows to conduct the large-scale experiments with artworks of a large volume and speeds up the work of a linguist. To calculate the stylometric features including ones based on the search results for rhythmic figures the ProseRhythmDetector program is used. As a result of its work, each text is presented as a set of the same features of three levels: characters, words, rhythm. Texts are combined by decades, for each decade there are found average values of stylometric features. The obtained models of decades are compared using standard similarity metrics, results of comparison are visualized in the form of the heat maps and dendrograms. Experiments with two corpora of Russian and British texts show that during the 19th-21st centuries there are general trends in style change for both corpora, for example, a decrease in the number of rhythmic figures per sentence, and also particular trends for each language, for example, dynamics of change of the word and sentence lengths. Stylometric features of all levels reveal the similarity in the style of texts published in one century. Also, features of three levels in the complex better demonstrate the uniqueness of each decade than features of a particular level. This study shows the importance of stylometric features as style markers of the different eras and allows us to identify trends in style during several centuries.https://www.mais-journal.ru/jour/article/view/1352ритм текстаанализ ритмаобработка естественного языкастилометрияритмические средстваавтоматизация
spellingShingle Ksenia V. Lagutina
Alla M. Manakhova
Automated Search and Analysis of the Stylometric Features that Describe the Style of the Prose 19th-21st Centuries
Моделирование и анализ информационных систем
ритм текста
анализ ритма
обработка естественного языка
стилометрия
ритмические средства
автоматизация
title Automated Search and Analysis of the Stylometric Features that Describe the Style of the Prose 19th-21st Centuries
title_full Automated Search and Analysis of the Stylometric Features that Describe the Style of the Prose 19th-21st Centuries
title_fullStr Automated Search and Analysis of the Stylometric Features that Describe the Style of the Prose 19th-21st Centuries
title_full_unstemmed Automated Search and Analysis of the Stylometric Features that Describe the Style of the Prose 19th-21st Centuries
title_short Automated Search and Analysis of the Stylometric Features that Describe the Style of the Prose 19th-21st Centuries
title_sort automated search and analysis of the stylometric features that describe the style of the prose 19th 21st centuries
topic ритм текста
анализ ритма
обработка естественного языка
стилометрия
ритмические средства
автоматизация
url https://www.mais-journal.ru/jour/article/view/1352
work_keys_str_mv AT kseniavlagutina automatedsearchandanalysisofthestylometricfeaturesthatdescribethestyleoftheprose19th21stcenturies
AT allammanakhova automatedsearchandanalysisofthestylometricfeaturesthatdescribethestyleoftheprose19th21stcenturies