Contemporary Persian Inflectional Analyzer

In recent years, the use of informal writing in Persian has grown significantly due to the increasing expansion of cyberspace and social media and platforms, and the tendency of users to bring the written language closer to colloquial speech. But on the other hand, proper tools to process this langu...

Full description

Bibliographic Details
Main Authors: Davood Heidarpour, Elham S.Sebt, Mahmoud Bi Jen Khan, Mostafa Salehi, Hadi Veisi
Format: Article
Language:fas
Published: Iranian Research Institute for Information and Technology 2021-07-01
Series:Iranian Journal of Information Processing & Management
Subjects:
Online Access:http://jipm.irandoc.ac.ir/article-1-4337-en.html
Description
Summary:In recent years, the use of informal writing in Persian has grown significantly due to the increasing expansion of cyberspace and social media and platforms, and the tendency of users to bring the written language closer to colloquial speech. But on the other hand, proper tools to process this language register are not developed very much. One of the tools for low level processing of textual data is an inflectional analyzer. However, such tools are not developed for this register yet. Informal words have their own structures, stems, morphemes and clitics and they also make use of formal structures and units. Moreover, this register also consists of formal words so any analyzer for informal words should have the potential to analyze formal words, too. In this paper, it is tried to cover all inflectional structures of informal Persian language to build an inflectional analyzer. A corpus of most of its known sub-registers is constructed to extract words, morphemes and inflectional rules and morphotactics. A part of this corpus is used for testing the analyzer. After extracting 1786 unique words of the test part, inflectional analyzer f-measure is equal to 97.67%. This tool can be used in computational processing of Persian language and it can also be used in teaching Persian, specifically colloquial Persian to non-Persian learners.
ISSN:2251-8223
2251-8231