Examination of text's lexis using a Polish dictionary
This paper presents an approach to compare and classify books written in the Polish language by comparing their lexis fields. Books can be classified by their features, such as literature type, literary genre, style, author, etc. Using a preassembled dictionary and Jaccard index, we managed to prove...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Lublin University of Technology
2021-12-01
|
Series: | Journal of Computer Sciences Institute |
Subjects: | |
Online Access: | https://ph.pollub.pl/index.php/jcsi/article/view/2731 |
_version_ | 1818898664564719616 |
---|---|
author | Roman Voitovych Edyta Łukasik |
author_facet | Roman Voitovych Edyta Łukasik |
author_sort | Roman Voitovych |
collection | DOAJ |
description | This paper presents an approach to compare and classify books written in the Polish language by comparing their lexis fields. Books can be classified by their features, such as literature type, literary genre, style, author, etc. Using a preassembled dictionary and Jaccard index, we managed to prove a compact hypothesis concerning similar books. Further analysis with the PAM clustering algorithm presented a lexical connection between books of the same type or author. Overall static behaviour of similarities of any particular field on one side and some anomalous tendencies in other cases suggest that recognition of other features is possible. The method presented in this article allows drawing conclusions regarding the connection between any arbitrary books based solely on their vocabulary. |
first_indexed | 2024-12-19T19:35:40Z |
format | Article |
id | doaj.art-5c7431208ca44b738ce5b14447d92415 |
institution | Directory Open Access Journal |
issn | 2544-0764 |
language | English |
last_indexed | 2024-12-19T19:35:40Z |
publishDate | 2021-12-01 |
publisher | Lublin University of Technology |
record_format | Article |
series | Journal of Computer Sciences Institute |
spelling | doaj.art-5c7431208ca44b738ce5b14447d924152022-12-21T20:08:28ZengLublin University of TechnologyJournal of Computer Sciences Institute2544-07642021-12-012110.35784/jcsi.2731Examination of text's lexis using a Polish dictionaryRoman Voitovych0Edyta Łukasik1{'en_US': 'Lublin University of Technology'}Lublin University of TechnologyThis paper presents an approach to compare and classify books written in the Polish language by comparing their lexis fields. Books can be classified by their features, such as literature type, literary genre, style, author, etc. Using a preassembled dictionary and Jaccard index, we managed to prove a compact hypothesis concerning similar books. Further analysis with the PAM clustering algorithm presented a lexical connection between books of the same type or author. Overall static behaviour of similarities of any particular field on one side and some anomalous tendencies in other cases suggest that recognition of other features is possible. The method presented in this article allows drawing conclusions regarding the connection between any arbitrary books based solely on their vocabulary.https://ph.pollub.pl/index.php/jcsi/article/view/2731natural language processinglexis analysisJaccard similarity coefficientPartitioning Around Medoids |
spellingShingle | Roman Voitovych Edyta Łukasik Examination of text's lexis using a Polish dictionary Journal of Computer Sciences Institute natural language processing lexis analysis Jaccard similarity coefficient Partitioning Around Medoids |
title | Examination of text's lexis using a Polish dictionary |
title_full | Examination of text's lexis using a Polish dictionary |
title_fullStr | Examination of text's lexis using a Polish dictionary |
title_full_unstemmed | Examination of text's lexis using a Polish dictionary |
title_short | Examination of text's lexis using a Polish dictionary |
title_sort | examination of text s lexis using a polish dictionary |
topic | natural language processing lexis analysis Jaccard similarity coefficient Partitioning Around Medoids |
url | https://ph.pollub.pl/index.php/jcsi/article/view/2731 |
work_keys_str_mv | AT romanvoitovych examinationoftextslexisusingapolishdictionary AT edytałukasik examinationoftextslexisusingapolishdictionary |