Comprehensive Readability Assessment of Scientific Learning Resources

Readability is the measure of how easier a piece of text is. Readability assessment plays a crucial role in facilitating content writers and proofreaders to receive guidance about how easy or difficult a piece of text is. In literature, classical readability, lexical measures, and deep learning base...

Full description

Bibliographic Details
Main Authors: Muddassira Arshad, Muhammad Murtaza Yousaf, Syed Mansoor Sarwar
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10132466/
_version_ 1827930558906236928
author Muddassira Arshad
Muhammad Murtaza Yousaf
Syed Mansoor Sarwar
author_facet Muddassira Arshad
Muhammad Murtaza Yousaf
Syed Mansoor Sarwar
author_sort Muddassira Arshad
collection DOAJ
description Readability is the measure of how easier a piece of text is. Readability assessment plays a crucial role in facilitating content writers and proofreaders to receive guidance about how easy or difficult a piece of text is. In literature, classical readability, lexical measures, and deep learning based model have been proposed to assess the text readability. However, readability assessment using machine and deep learning is a data-intensive task, which requires a reasonable-sized dataset for accurate assessment. While several datasets, readability indices (RI) and assessment models have been proposed for military agencies manuals, health documents, and early educational materials, studies related to the readability assessment of computer science literature are limited. To address this gap, we have contributed Computer science (CS) literature dataset <bold>AGREE</bold>, comprising 42,850 learning resources(LR). We assessed the readability of learning objects(LOs) pertaining to domains of Computer Science (CS), machine learning (ML), software engineering (SE), and natural language processing (NLP). LOs consists of research papers, lecture notes and Wikipedia content of topics list of learning repositories for CS, NLP, SE and ML in English Language. From the statistically significant sample of LOs two annotators manually annotated LO&#x2019;s text difficulty and established gold standard. Text readability was computed using 14 readability Indices (RI) and 12 lexical measures (LM). RI were ensembled, and readability measures were used to train the model for readability assessment. The results indicate that the extra tree classifier performs well on the AGREE dataset, exhibiting high accuracy, F1 score, and efficiency. We observed that there is no consensus among readability measures for shorter texts, but as the length of the text increases, the accuracy improves. The AGREE and SELRD datasets, along with the associated readability measures, provide a novel contribution to the field. They can be used to train deep learning models for readability assessment, develop recommender systems, and assist in curriculum planning within the domain of Computer Science. In the future, we plan to scale AGREE by adding more LOs and adding multimedia LOs. In addition, we would explore the use of deep learning methods for improved readability assessment.
first_indexed 2024-03-13T06:38:27Z
format Article
id doaj.art-c3e96bef13ca477fb677b3d2658dc2ba
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-13T06:38:27Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-c3e96bef13ca477fb677b3d2658dc2ba2023-06-08T23:00:51ZengIEEEIEEE Access2169-35362023-01-0111539785399410.1109/ACCESS.2023.327936010132466Comprehensive Readability Assessment of Scientific Learning ResourcesMuddassira Arshad0https://orcid.org/0000-0001-9337-6602Muhammad Murtaza Yousaf1https://orcid.org/0000-0001-9578-8811Syed Mansoor Sarwar2https://orcid.org/0000-0003-2377-3201Department of Computer Science, University of the Punjab, Lahore, PakistanDepartment of Software Engineering, University of the Punjab, Lahore, PakistanUniversity of Engineering and Technology, Lahore, PakistanReadability is the measure of how easier a piece of text is. Readability assessment plays a crucial role in facilitating content writers and proofreaders to receive guidance about how easy or difficult a piece of text is. In literature, classical readability, lexical measures, and deep learning based model have been proposed to assess the text readability. However, readability assessment using machine and deep learning is a data-intensive task, which requires a reasonable-sized dataset for accurate assessment. While several datasets, readability indices (RI) and assessment models have been proposed for military agencies manuals, health documents, and early educational materials, studies related to the readability assessment of computer science literature are limited. To address this gap, we have contributed Computer science (CS) literature dataset <bold>AGREE</bold>, comprising 42,850 learning resources(LR). We assessed the readability of learning objects(LOs) pertaining to domains of Computer Science (CS), machine learning (ML), software engineering (SE), and natural language processing (NLP). LOs consists of research papers, lecture notes and Wikipedia content of topics list of learning repositories for CS, NLP, SE and ML in English Language. From the statistically significant sample of LOs two annotators manually annotated LO&#x2019;s text difficulty and established gold standard. Text readability was computed using 14 readability Indices (RI) and 12 lexical measures (LM). RI were ensembled, and readability measures were used to train the model for readability assessment. The results indicate that the extra tree classifier performs well on the AGREE dataset, exhibiting high accuracy, F1 score, and efficiency. We observed that there is no consensus among readability measures for shorter texts, but as the length of the text increases, the accuracy improves. The AGREE and SELRD datasets, along with the associated readability measures, provide a novel contribution to the field. They can be used to train deep learning models for readability assessment, develop recommender systems, and assist in curriculum planning within the domain of Computer Science. In the future, we plan to scale AGREE by adding more LOs and adding multimedia LOs. In addition, we would explore the use of deep learning methods for improved readability assessment.https://ieeexplore.ieee.org/document/10132466/Automated readability indexCS learning resource repositoryFlesch Kincaid reading easeFlesch Kincaid grade indexgunning fog readability indexlexical diversity
spellingShingle Muddassira Arshad
Muhammad Murtaza Yousaf
Syed Mansoor Sarwar
Comprehensive Readability Assessment of Scientific Learning Resources
IEEE Access
Automated readability index
CS learning resource repository
Flesch Kincaid reading ease
Flesch Kincaid grade index
gunning fog readability index
lexical diversity
title Comprehensive Readability Assessment of Scientific Learning Resources
title_full Comprehensive Readability Assessment of Scientific Learning Resources
title_fullStr Comprehensive Readability Assessment of Scientific Learning Resources
title_full_unstemmed Comprehensive Readability Assessment of Scientific Learning Resources
title_short Comprehensive Readability Assessment of Scientific Learning Resources
title_sort comprehensive readability assessment of scientific learning resources
topic Automated readability index
CS learning resource repository
Flesch Kincaid reading ease
Flesch Kincaid grade index
gunning fog readability index
lexical diversity
url https://ieeexplore.ieee.org/document/10132466/
work_keys_str_mv AT muddassiraarshad comprehensivereadabilityassessmentofscientificlearningresources
AT muhammadmurtazayousaf comprehensivereadabilityassessmentofscientificlearningresources
AT syedmansoorsarwar comprehensivereadabilityassessmentofscientificlearningresources