Examination of text's lexis using a Polish dictionary

This paper presents an approach to compare and classify books written in the Polish language by comparing their lexis fields. Books can be classified by their features, such as literature type, literary genre, style, author, etc. Using a preassembled dictionary and Jaccard index, we managed to prove...

Full description

Bibliographic Details
Main Authors: Roman Voitovych, Edyta Łukasik
Format: Article
Language:English
Published: Lublin University of Technology 2021-12-01
Series:Journal of Computer Sciences Institute
Subjects:
Online Access:https://ph.pollub.pl/index.php/jcsi/article/view/2731
_version_ 1818898664564719616
author Roman Voitovych
Edyta Łukasik
author_facet Roman Voitovych
Edyta Łukasik
author_sort Roman Voitovych
collection DOAJ
description This paper presents an approach to compare and classify books written in the Polish language by comparing their lexis fields. Books can be classified by their features, such as literature type, literary genre, style, author, etc. Using a preassembled dictionary and Jaccard index, we managed to prove a compact hypothesis concerning similar books. Further analysis with the PAM clustering algorithm presented a lexical connection between books of the same type or author. Overall static behaviour of similarities of any particular field on one side and some anomalous tendencies in other cases suggest that recognition of other features is possible. The method presented in this article allows drawing conclusions regarding the connection between any arbitrary books based solely on their vocabulary.
first_indexed 2024-12-19T19:35:40Z
format Article
id doaj.art-5c7431208ca44b738ce5b14447d92415
institution Directory Open Access Journal
issn 2544-0764
language English
last_indexed 2024-12-19T19:35:40Z
publishDate 2021-12-01
publisher Lublin University of Technology
record_format Article
series Journal of Computer Sciences Institute
spelling doaj.art-5c7431208ca44b738ce5b14447d924152022-12-21T20:08:28ZengLublin University of TechnologyJournal of Computer Sciences Institute2544-07642021-12-012110.35784/jcsi.2731Examination of text's lexis using a Polish dictionaryRoman Voitovych0Edyta Łukasik1{'en_US': 'Lublin University of Technology'}Lublin University of TechnologyThis paper presents an approach to compare and classify books written in the Polish language by comparing their lexis fields. Books can be classified by their features, such as literature type, literary genre, style, author, etc. Using a preassembled dictionary and Jaccard index, we managed to prove a compact hypothesis concerning similar books. Further analysis with the PAM clustering algorithm presented a lexical connection between books of the same type or author. Overall static behaviour of similarities of any particular field on one side and some anomalous tendencies in other cases suggest that recognition of other features is possible. The method presented in this article allows drawing conclusions regarding the connection between any arbitrary books based solely on their vocabulary.https://ph.pollub.pl/index.php/jcsi/article/view/2731natural language processinglexis analysisJaccard similarity coefficientPartitioning Around Medoids
spellingShingle Roman Voitovych
Edyta Łukasik
Examination of text's lexis using a Polish dictionary
Journal of Computer Sciences Institute
natural language processing
lexis analysis
Jaccard similarity coefficient
Partitioning Around Medoids
title Examination of text's lexis using a Polish dictionary
title_full Examination of text's lexis using a Polish dictionary
title_fullStr Examination of text's lexis using a Polish dictionary
title_full_unstemmed Examination of text's lexis using a Polish dictionary
title_short Examination of text's lexis using a Polish dictionary
title_sort examination of text s lexis using a polish dictionary
topic natural language processing
lexis analysis
Jaccard similarity coefficient
Partitioning Around Medoids
url https://ph.pollub.pl/index.php/jcsi/article/view/2731
work_keys_str_mv AT romanvoitovych examinationoftextslexisusingapolishdictionary
AT edytałukasik examinationoftextslexisusingapolishdictionary