Comprehensive Readability Assessment of Scientific Learning Resources

Readability is the measure of how easier a piece of text is. Readability assessment plays a crucial role in facilitating content writers and proofreaders to receive guidance about how easy or difficult a piece of text is. In literature, classical readability, lexical measures, and deep learning base...

Full description

Bibliographic Details
Main Authors:	Muddassira Arshad, Muhammad Murtaza Yousaf, Syed Mansoor Sarwar
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Automated readability index CS learning resource repository Flesch Kincaid reading ease Flesch Kincaid grade index gunning fog readability index lexical diversity
Online Access:	https://ieeexplore.ieee.org/document/10132466/

_version_	1827930558906236928
author	Muddassira Arshad Muhammad Murtaza Yousaf Syed Mansoor Sarwar
author_facet	Muddassira Arshad Muhammad Murtaza Yousaf Syed Mansoor Sarwar
author_sort	Muddassira Arshad
collection	DOAJ
description	Readability is the measure of how easier a piece of text is. Readability assessment plays a crucial role in facilitating content writers and proofreaders to receive guidance about how easy or difficult a piece of text is. In literature, classical readability, lexical measures, and deep learning based model have been proposed to assess the text readability. However, readability assessment using machine and deep learning is a data-intensive task, which requires a reasonable-sized dataset for accurate assessment. While several datasets, readability indices (RI) and assessment models have been proposed for military agencies manuals, health documents, and early educational materials, studies related to the readability assessment of computer science literature are limited. To address this gap, we have contributed Computer science (CS) literature dataset <bold>AGREE</bold>, comprising 42,850 learning resources(LR). We assessed the readability of learning objects(LOs) pertaining to domains of Computer Science (CS), machine learning (ML), software engineering (SE), and natural language processing (NLP). LOs consists of research papers, lecture notes and Wikipedia content of topics list of learning repositories for CS, NLP, SE and ML in English Language. From the statistically significant sample of LOs two annotators manually annotated LO’s text difficulty and established gold standard. Text readability was computed using 14 readability Indices (RI) and 12 lexical measures (LM). RI were ensembled, and readability measures were used to train the model for readability assessment. The results indicate that the extra tree classifier performs well on the AGREE dataset, exhibiting high accuracy, F1 score, and efficiency. We observed that there is no consensus among readability measures for shorter texts, but as the length of the text increases, the accuracy improves. The AGREE and SELRD datasets, along with the associated readability measures, provide a novel contribution to the field. They can be used to train deep learning models for readability assessment, develop recommender systems, and assist in curriculum planning within the domain of Computer Science. In the future, we plan to scale AGREE by adding more LOs and adding multimedia LOs. In addition, we would explore the use of deep learning methods for improved readability assessment.
first_indexed	2024-03-13T06:38:27Z
format	Article
id	doaj.art-c3e96bef13ca477fb677b3d2658dc2ba
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-13T06:38:27Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-c3e96bef13ca477fb677b3d2658dc2ba2023-06-08T23:00:51ZengIEEEIEEE Access2169-35362023-01-0111539785399410.1109/ACCESS.2023.327936010132466Comprehensive Readability Assessment of Scientific Learning ResourcesMuddassira Arshad0https://orcid.org/0000-0001-9337-6602Muhammad Murtaza Yousaf1https://orcid.org/0000-0001-9578-8811Syed Mansoor Sarwar2https://orcid.org/0000-0003-2377-3201Department of Computer Science, University of the Punjab, Lahore, PakistanDepartment of Software Engineering, University of the Punjab, Lahore, PakistanUniversity of Engineering and Technology, Lahore, PakistanReadability is the measure of how easier a piece of text is. Readability assessment plays a crucial role in facilitating content writers and proofreaders to receive guidance about how easy or difficult a piece of text is. In literature, classical readability, lexical measures, and deep learning based model have been proposed to assess the text readability. However, readability assessment using machine and deep learning is a data-intensive task, which requires a reasonable-sized dataset for accurate assessment. While several datasets, readability indices (RI) and assessment models have been proposed for military agencies manuals, health documents, and early educational materials, studies related to the readability assessment of computer science literature are limited. To address this gap, we have contributed Computer science (CS) literature dataset <bold>AGREE</bold>, comprising 42,850 learning resources(LR). We assessed the readability of learning objects(LOs) pertaining to domains of Computer Science (CS), machine learning (ML), software engineering (SE), and natural language processing (NLP). LOs consists of research papers, lecture notes and Wikipedia content of topics list of learning repositories for CS, NLP, SE and ML in English Language. From the statistically significant sample of LOs two annotators manually annotated LO’s text difficulty and established gold standard. Text readability was computed using 14 readability Indices (RI) and 12 lexical measures (LM). RI were ensembled, and readability measures were used to train the model for readability assessment. The results indicate that the extra tree classifier performs well on the AGREE dataset, exhibiting high accuracy, F1 score, and efficiency. We observed that there is no consensus among readability measures for shorter texts, but as the length of the text increases, the accuracy improves. The AGREE and SELRD datasets, along with the associated readability measures, provide a novel contribution to the field. They can be used to train deep learning models for readability assessment, develop recommender systems, and assist in curriculum planning within the domain of Computer Science. In the future, we plan to scale AGREE by adding more LOs and adding multimedia LOs. In addition, we would explore the use of deep learning methods for improved readability assessment.https://ieeexplore.ieee.org/document/10132466/Automated readability indexCS learning resource repositoryFlesch Kincaid reading easeFlesch Kincaid grade indexgunning fog readability indexlexical diversity
spellingShingle	Muddassira Arshad Muhammad Murtaza Yousaf Syed Mansoor Sarwar Comprehensive Readability Assessment of Scientific Learning Resources IEEE Access Automated readability index CS learning resource repository Flesch Kincaid reading ease Flesch Kincaid grade index gunning fog readability index lexical diversity
title	Comprehensive Readability Assessment of Scientific Learning Resources
title_full	Comprehensive Readability Assessment of Scientific Learning Resources
title_fullStr	Comprehensive Readability Assessment of Scientific Learning Resources
title_full_unstemmed	Comprehensive Readability Assessment of Scientific Learning Resources
title_short	Comprehensive Readability Assessment of Scientific Learning Resources
title_sort	comprehensive readability assessment of scientific learning resources
topic	Automated readability index CS learning resource repository Flesch Kincaid reading ease Flesch Kincaid grade index gunning fog readability index lexical diversity
url	https://ieeexplore.ieee.org/document/10132466/
work_keys_str_mv	AT muddassiraarshad comprehensivereadabilityassessmentofscientificlearningresources AT muhammadmurtazayousaf comprehensivereadabilityassessmentofscientificlearningresources AT syedmansoorsarwar comprehensivereadabilityassessmentofscientificlearningresources

Comprehensive Readability Assessment of Scientific Learning Resources

Similar Items