Comprehensive Readability Assessment of Scientific Learning Resources
Readability is the measure of how easier a piece of text is. Readability assessment plays a crucial role in facilitating content writers and proofreaders to receive guidance about how easy or difficult a piece of text is. In literature, classical readability, lexical measures, and deep learning base...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10132466/ |
_version_ | 1827930558906236928 |
---|---|
author | Muddassira Arshad Muhammad Murtaza Yousaf Syed Mansoor Sarwar |
author_facet | Muddassira Arshad Muhammad Murtaza Yousaf Syed Mansoor Sarwar |
author_sort | Muddassira Arshad |
collection | DOAJ |
description | Readability is the measure of how easier a piece of text is. Readability assessment plays a crucial role in facilitating content writers and proofreaders to receive guidance about how easy or difficult a piece of text is. In literature, classical readability, lexical measures, and deep learning based model have been proposed to assess the text readability. However, readability assessment using machine and deep learning is a data-intensive task, which requires a reasonable-sized dataset for accurate assessment. While several datasets, readability indices (RI) and assessment models have been proposed for military agencies manuals, health documents, and early educational materials, studies related to the readability assessment of computer science literature are limited. To address this gap, we have contributed Computer science (CS) literature dataset <bold>AGREE</bold>, comprising 42,850 learning resources(LR). We assessed the readability of learning objects(LOs) pertaining to domains of Computer Science (CS), machine learning (ML), software engineering (SE), and natural language processing (NLP). LOs consists of research papers, lecture notes and Wikipedia content of topics list of learning repositories for CS, NLP, SE and ML in English Language. From the statistically significant sample of LOs two annotators manually annotated LO’s text difficulty and established gold standard. Text readability was computed using 14 readability Indices (RI) and 12 lexical measures (LM). RI were ensembled, and readability measures were used to train the model for readability assessment. The results indicate that the extra tree classifier performs well on the AGREE dataset, exhibiting high accuracy, F1 score, and efficiency. We observed that there is no consensus among readability measures for shorter texts, but as the length of the text increases, the accuracy improves. The AGREE and SELRD datasets, along with the associated readability measures, provide a novel contribution to the field. They can be used to train deep learning models for readability assessment, develop recommender systems, and assist in curriculum planning within the domain of Computer Science. In the future, we plan to scale AGREE by adding more LOs and adding multimedia LOs. In addition, we would explore the use of deep learning methods for improved readability assessment. |
first_indexed | 2024-03-13T06:38:27Z |
format | Article |
id | doaj.art-c3e96bef13ca477fb677b3d2658dc2ba |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-13T06:38:27Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-c3e96bef13ca477fb677b3d2658dc2ba2023-06-08T23:00:51ZengIEEEIEEE Access2169-35362023-01-0111539785399410.1109/ACCESS.2023.327936010132466Comprehensive Readability Assessment of Scientific Learning ResourcesMuddassira Arshad0https://orcid.org/0000-0001-9337-6602Muhammad Murtaza Yousaf1https://orcid.org/0000-0001-9578-8811Syed Mansoor Sarwar2https://orcid.org/0000-0003-2377-3201Department of Computer Science, University of the Punjab, Lahore, PakistanDepartment of Software Engineering, University of the Punjab, Lahore, PakistanUniversity of Engineering and Technology, Lahore, PakistanReadability is the measure of how easier a piece of text is. Readability assessment plays a crucial role in facilitating content writers and proofreaders to receive guidance about how easy or difficult a piece of text is. In literature, classical readability, lexical measures, and deep learning based model have been proposed to assess the text readability. However, readability assessment using machine and deep learning is a data-intensive task, which requires a reasonable-sized dataset for accurate assessment. While several datasets, readability indices (RI) and assessment models have been proposed for military agencies manuals, health documents, and early educational materials, studies related to the readability assessment of computer science literature are limited. To address this gap, we have contributed Computer science (CS) literature dataset <bold>AGREE</bold>, comprising 42,850 learning resources(LR). We assessed the readability of learning objects(LOs) pertaining to domains of Computer Science (CS), machine learning (ML), software engineering (SE), and natural language processing (NLP). LOs consists of research papers, lecture notes and Wikipedia content of topics list of learning repositories for CS, NLP, SE and ML in English Language. From the statistically significant sample of LOs two annotators manually annotated LO’s text difficulty and established gold standard. Text readability was computed using 14 readability Indices (RI) and 12 lexical measures (LM). RI were ensembled, and readability measures were used to train the model for readability assessment. The results indicate that the extra tree classifier performs well on the AGREE dataset, exhibiting high accuracy, F1 score, and efficiency. We observed that there is no consensus among readability measures for shorter texts, but as the length of the text increases, the accuracy improves. The AGREE and SELRD datasets, along with the associated readability measures, provide a novel contribution to the field. They can be used to train deep learning models for readability assessment, develop recommender systems, and assist in curriculum planning within the domain of Computer Science. In the future, we plan to scale AGREE by adding more LOs and adding multimedia LOs. In addition, we would explore the use of deep learning methods for improved readability assessment.https://ieeexplore.ieee.org/document/10132466/Automated readability indexCS learning resource repositoryFlesch Kincaid reading easeFlesch Kincaid grade indexgunning fog readability indexlexical diversity |
spellingShingle | Muddassira Arshad Muhammad Murtaza Yousaf Syed Mansoor Sarwar Comprehensive Readability Assessment of Scientific Learning Resources IEEE Access Automated readability index CS learning resource repository Flesch Kincaid reading ease Flesch Kincaid grade index gunning fog readability index lexical diversity |
title | Comprehensive Readability Assessment of Scientific Learning Resources |
title_full | Comprehensive Readability Assessment of Scientific Learning Resources |
title_fullStr | Comprehensive Readability Assessment of Scientific Learning Resources |
title_full_unstemmed | Comprehensive Readability Assessment of Scientific Learning Resources |
title_short | Comprehensive Readability Assessment of Scientific Learning Resources |
title_sort | comprehensive readability assessment of scientific learning resources |
topic | Automated readability index CS learning resource repository Flesch Kincaid reading ease Flesch Kincaid grade index gunning fog readability index lexical diversity |
url | https://ieeexplore.ieee.org/document/10132466/ |
work_keys_str_mv | AT muddassiraarshad comprehensivereadabilityassessmentofscientificlearningresources AT muhammadmurtazayousaf comprehensivereadabilityassessmentofscientificlearningresources AT syedmansoorsarwar comprehensivereadabilityassessmentofscientificlearningresources |