Application of Computational Linguistics to Predicting Language Proficiency Level of Persian Learners’ Textbooks

One subfield of assessment of language proficiency is predicting language proficiency level.This research aims at proposing a computational linguistic model to predict language proficiency level and to explore the general properties of the levels. To this end, a corpus is developed from Persian lear...

Full description

Bibliographic Details
Main Author: Masood Ghayoomi
Format: Article
Language:English
Published: Alzahra University 2022-05-01
Series:Journal of Language Horizons
Subjects:
Online Access:https://lghor.alzahra.ac.ir/article_5408_fa9d32e31cb704550c3da396d6a03405.pdf
_version_ 1828796782429077504
author Masood Ghayoomi
author_facet Masood Ghayoomi
author_sort Masood Ghayoomi
collection DOAJ
description One subfield of assessment of language proficiency is predicting language proficiency level.This research aims at proposing a computational linguistic model to predict language proficiency level and to explore the general properties of the levels. To this end, a corpus is developed from Persian learners' textbooks and statistical and linguistic features are extracted from this text corpus to train three classifiers as learners. The performance of the models vary based on the learning algorithm and the feature set(s) used for training the models. For evaluating the models, four standard metrics, namely accuracy, precision, recall, and F-measure were used. Based on the results, the model created by the Random Forest classifier performed the best when statistical features extracted from raw text is used. The Support Vector Machine classifier performed the best by using linguistic features extracted from the automatically annotated corpus. The results determine that enriching the model and providing various kinds of information do not guarantee that a classifier (learner) performs the best.To discover the latent teaching methodology of the textbooks, the general performance of the classifiers with respect to the language level and the linguistic knowledge used for creating the model are studied. Based on the obtained results, the amount of extracted features plays an important role in training a classifier. Furthermore, the average best performance of the classifiers is extending the linguistic knowledge from syntactic patterns at proficiency level A (beginner) to all linguistic information at levels B (intermediate) and C (advanced).
first_indexed 2024-12-12T04:28:56Z
format Article
id doaj.art-3b5a424d0cfd467eae26ef34a8aa52ca
institution Directory Open Access Journal
issn 2588-350X
2588-5634
language English
last_indexed 2024-12-12T04:28:56Z
publishDate 2022-05-01
publisher Alzahra University
record_format Article
series Journal of Language Horizons
spelling doaj.art-3b5a424d0cfd467eae26ef34a8aa52ca2022-12-22T00:38:08ZengAlzahra UniversityJournal of Language Horizons2588-350X2588-56342022-05-0161295210.22051/lghor.2021.32656.13545408Application of Computational Linguistics to Predicting Language Proficiency Level of Persian Learners’ TextbooksMasood Ghayoomi0Assistant Professor, Faculty of Linguistics, Institute for the Humanities and Cultural Studies, Tehran, Iran;One subfield of assessment of language proficiency is predicting language proficiency level.This research aims at proposing a computational linguistic model to predict language proficiency level and to explore the general properties of the levels. To this end, a corpus is developed from Persian learners' textbooks and statistical and linguistic features are extracted from this text corpus to train three classifiers as learners. The performance of the models vary based on the learning algorithm and the feature set(s) used for training the models. For evaluating the models, four standard metrics, namely accuracy, precision, recall, and F-measure were used. Based on the results, the model created by the Random Forest classifier performed the best when statistical features extracted from raw text is used. The Support Vector Machine classifier performed the best by using linguistic features extracted from the automatically annotated corpus. The results determine that enriching the model and providing various kinds of information do not guarantee that a classifier (learner) performs the best.To discover the latent teaching methodology of the textbooks, the general performance of the classifiers with respect to the language level and the linguistic knowledge used for creating the model are studied. Based on the obtained results, the amount of extracted features plays an important role in training a classifier. Furthermore, the average best performance of the classifiers is extending the linguistic knowledge from syntactic patterns at proficiency level A (beginner) to all linguistic information at levels B (intermediate) and C (advanced).https://lghor.alzahra.ac.ir/article_5408_fa9d32e31cb704550c3da396d6a03405.pdfmachine learningclassificationfeaturecomputational cognitive modelpersian learner
spellingShingle Masood Ghayoomi
Application of Computational Linguistics to Predicting Language Proficiency Level of Persian Learners’ Textbooks
Journal of Language Horizons
machine learning
classification
feature
computational cognitive model
persian learner
title Application of Computational Linguistics to Predicting Language Proficiency Level of Persian Learners’ Textbooks
title_full Application of Computational Linguistics to Predicting Language Proficiency Level of Persian Learners’ Textbooks
title_fullStr Application of Computational Linguistics to Predicting Language Proficiency Level of Persian Learners’ Textbooks
title_full_unstemmed Application of Computational Linguistics to Predicting Language Proficiency Level of Persian Learners’ Textbooks
title_short Application of Computational Linguistics to Predicting Language Proficiency Level of Persian Learners’ Textbooks
title_sort application of computational linguistics to predicting language proficiency level of persian learners textbooks
topic machine learning
classification
feature
computational cognitive model
persian learner
url https://lghor.alzahra.ac.ir/article_5408_fa9d32e31cb704550c3da396d6a03405.pdf
work_keys_str_mv AT masoodghayoomi applicationofcomputationallinguisticstopredictinglanguageproficiencylevelofpersianlearnerstextbooks