Automatic Classification of Text Complexity

This work introduces an automatic classification system for measuring the complexity level of a given Italian text under a linguistic point-of-view. The task of measuring the complexity of a text is cast to a supervised classification problem by exploiting a dataset of texts purposely produced by li...

Full description

Bibliographic Details
Main Authors: Valentino Santucci, Filippo Santarelli, Luciana Forti, Stefania Spina
Format: Article
Language:English
Published: MDPI AG 2020-10-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/20/7285
_version_ 1797550624299024384
author Valentino Santucci
Filippo Santarelli
Luciana Forti
Stefania Spina
author_facet Valentino Santucci
Filippo Santarelli
Luciana Forti
Stefania Spina
author_sort Valentino Santucci
collection DOAJ
description This work introduces an automatic classification system for measuring the complexity level of a given Italian text under a linguistic point-of-view. The task of measuring the complexity of a text is cast to a supervised classification problem by exploiting a dataset of texts purposely produced by linguistic experts for second language teaching and assessment purposes. The commonly adopted Common European Framework of Reference for Languages (CEFR) levels were used as target classification classes, texts were elaborated by considering a large set of numeric linguistic features, and an experimental comparison among ten widely used machine learning models was conducted. The results show that the proposed approach is able to obtain a good prediction accuracy, while a further analysis was conducted in order to identify the categories of features that influenced the predictions.
first_indexed 2024-03-10T15:32:56Z
format Article
id doaj.art-df8aca2b88414281b5f23a9ded2338fa
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T15:32:56Z
publishDate 2020-10-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-df8aca2b88414281b5f23a9ded2338fa2023-11-20T17:33:00ZengMDPI AGApplied Sciences2076-34172020-10-011020728510.3390/app10207285Automatic Classification of Text ComplexityValentino Santucci0Filippo Santarelli1Luciana Forti2Stefania Spina3Department of Humanities and Social Sciences, University for Foreigners of Perugia, 06123 Perugia, ItalyIstituto per Applicazioni del Calcolo, CNR, 00185 Roma, ItalyDepartment of Humanities and Social Sciences, University for Foreigners of Perugia, 06123 Perugia, ItalyDepartment of Humanities and Social Sciences, University for Foreigners of Perugia, 06123 Perugia, ItalyThis work introduces an automatic classification system for measuring the complexity level of a given Italian text under a linguistic point-of-view. The task of measuring the complexity of a text is cast to a supervised classification problem by exploiting a dataset of texts purposely produced by linguistic experts for second language teaching and assessment purposes. The commonly adopted Common European Framework of Reference for Languages (CEFR) levels were used as target classification classes, texts were elaborated by considering a large set of numeric linguistic features, and an experimental comparison among ten widely used machine learning models was conducted. The results show that the proposed approach is able to obtain a good prediction accuracy, while a further analysis was conducted in order to identify the categories of features that influenced the predictions.https://www.mdpi.com/2076-3417/10/20/7285natural language processingtext classificationmeasuring text complexity
spellingShingle Valentino Santucci
Filippo Santarelli
Luciana Forti
Stefania Spina
Automatic Classification of Text Complexity
Applied Sciences
natural language processing
text classification
measuring text complexity
title Automatic Classification of Text Complexity
title_full Automatic Classification of Text Complexity
title_fullStr Automatic Classification of Text Complexity
title_full_unstemmed Automatic Classification of Text Complexity
title_short Automatic Classification of Text Complexity
title_sort automatic classification of text complexity
topic natural language processing
text classification
measuring text complexity
url https://www.mdpi.com/2076-3417/10/20/7285
work_keys_str_mv AT valentinosantucci automaticclassificationoftextcomplexity
AT filipposantarelli automaticclassificationoftextcomplexity
AT lucianaforti automaticclassificationoftextcomplexity
AT stefaniaspina automaticclassificationoftextcomplexity