Automatic Classification of Text Complexity
This work introduces an automatic classification system for measuring the complexity level of a given Italian text under a linguistic point-of-view. The task of measuring the complexity of a text is cast to a supervised classification problem by exploiting a dataset of texts purposely produced by li...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-10-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/10/20/7285 |
_version_ | 1797550624299024384 |
---|---|
author | Valentino Santucci Filippo Santarelli Luciana Forti Stefania Spina |
author_facet | Valentino Santucci Filippo Santarelli Luciana Forti Stefania Spina |
author_sort | Valentino Santucci |
collection | DOAJ |
description | This work introduces an automatic classification system for measuring the complexity level of a given Italian text under a linguistic point-of-view. The task of measuring the complexity of a text is cast to a supervised classification problem by exploiting a dataset of texts purposely produced by linguistic experts for second language teaching and assessment purposes. The commonly adopted Common European Framework of Reference for Languages (CEFR) levels were used as target classification classes, texts were elaborated by considering a large set of numeric linguistic features, and an experimental comparison among ten widely used machine learning models was conducted. The results show that the proposed approach is able to obtain a good prediction accuracy, while a further analysis was conducted in order to identify the categories of features that influenced the predictions. |
first_indexed | 2024-03-10T15:32:56Z |
format | Article |
id | doaj.art-df8aca2b88414281b5f23a9ded2338fa |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T15:32:56Z |
publishDate | 2020-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-df8aca2b88414281b5f23a9ded2338fa2023-11-20T17:33:00ZengMDPI AGApplied Sciences2076-34172020-10-011020728510.3390/app10207285Automatic Classification of Text ComplexityValentino Santucci0Filippo Santarelli1Luciana Forti2Stefania Spina3Department of Humanities and Social Sciences, University for Foreigners of Perugia, 06123 Perugia, ItalyIstituto per Applicazioni del Calcolo, CNR, 00185 Roma, ItalyDepartment of Humanities and Social Sciences, University for Foreigners of Perugia, 06123 Perugia, ItalyDepartment of Humanities and Social Sciences, University for Foreigners of Perugia, 06123 Perugia, ItalyThis work introduces an automatic classification system for measuring the complexity level of a given Italian text under a linguistic point-of-view. The task of measuring the complexity of a text is cast to a supervised classification problem by exploiting a dataset of texts purposely produced by linguistic experts for second language teaching and assessment purposes. The commonly adopted Common European Framework of Reference for Languages (CEFR) levels were used as target classification classes, texts were elaborated by considering a large set of numeric linguistic features, and an experimental comparison among ten widely used machine learning models was conducted. The results show that the proposed approach is able to obtain a good prediction accuracy, while a further analysis was conducted in order to identify the categories of features that influenced the predictions.https://www.mdpi.com/2076-3417/10/20/7285natural language processingtext classificationmeasuring text complexity |
spellingShingle | Valentino Santucci Filippo Santarelli Luciana Forti Stefania Spina Automatic Classification of Text Complexity Applied Sciences natural language processing text classification measuring text complexity |
title | Automatic Classification of Text Complexity |
title_full | Automatic Classification of Text Complexity |
title_fullStr | Automatic Classification of Text Complexity |
title_full_unstemmed | Automatic Classification of Text Complexity |
title_short | Automatic Classification of Text Complexity |
title_sort | automatic classification of text complexity |
topic | natural language processing text classification measuring text complexity |
url | https://www.mdpi.com/2076-3417/10/20/7285 |
work_keys_str_mv | AT valentinosantucci automaticclassificationoftextcomplexity AT filipposantarelli automaticclassificationoftextcomplexity AT lucianaforti automaticclassificationoftextcomplexity AT stefaniaspina automaticclassificationoftextcomplexity |