Integrating oversampling and ensemble-based machine learning techniques for an imbalanced dataset in dyslexia screening tests

Developmental Dyslexia is a learning disorder often discovered in school-aged children who face difficulties while reading or spelling words even though they may have average or above-average levels of intelligence. This ultimately results in anger, frustration, low self-esteem, and other negative f...

Full description

Bibliographic Details
Main Authors: Shahriar Kaisar, Abdullahi Chowdhury
Format: Article
Language:English
Published: Elsevier 2022-12-01
Series:ICT Express
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405959522000327
_version_ 1828281733309530112
author Shahriar Kaisar
Abdullahi Chowdhury
author_facet Shahriar Kaisar
Abdullahi Chowdhury
author_sort Shahriar Kaisar
collection DOAJ
description Developmental Dyslexia is a learning disorder often discovered in school-aged children who face difficulties while reading or spelling words even though they may have average or above-average levels of intelligence. This ultimately results in anger, frustration, low self-esteem, and other negative feelings. Early detection of Dyslexia can be highly beneficial for dyslexic children as their learning needs can be properly addressed. Researchers have used several testing techniques for early discovery where the data is collected from reading and writing tests, online games, Magnetic reasoning imaging (MRI) and Electroencephalography (EEG) scans, picture and video recording. Several Machine learning techniques have also been used in this regard recently. However, existing works did not focus on the problem of the imbalanced dataset where the percentage of dyslexic participants is much higher compared to non-dyslexic participants, which is expected to be the case for pre-screening among a random population. This paper addresses the imbalanced dataset obtained from dyslexia pre-screening tests and proposes an oversampling and ensemble-based machine learning technique for the detection of Dyslexia. Simulation results show that the proposed approach improves the detection accuracy of the minority class, i.e., dyslexic patients from 80.61% to 83.52%.
first_indexed 2024-04-13T08:20:01Z
format Article
id doaj.art-3873af79044f4d4c98b3023cd55f5873
institution Directory Open Access Journal
issn 2405-9595
language English
last_indexed 2024-04-13T08:20:01Z
publishDate 2022-12-01
publisher Elsevier
record_format Article
series ICT Express
spelling doaj.art-3873af79044f4d4c98b3023cd55f58732022-12-22T02:54:41ZengElsevierICT Express2405-95952022-12-0184563568Integrating oversampling and ensemble-based machine learning techniques for an imbalanced dataset in dyslexia screening testsShahriar Kaisar0Abdullahi Chowdhury1Department of Information Systems and Business Analytics, RMIT University, Australia; Corresponding author.Faculty of Engineering, Computer and Mathematical Sciences, University of Adelaide, AustraliaDevelopmental Dyslexia is a learning disorder often discovered in school-aged children who face difficulties while reading or spelling words even though they may have average or above-average levels of intelligence. This ultimately results in anger, frustration, low self-esteem, and other negative feelings. Early detection of Dyslexia can be highly beneficial for dyslexic children as their learning needs can be properly addressed. Researchers have used several testing techniques for early discovery where the data is collected from reading and writing tests, online games, Magnetic reasoning imaging (MRI) and Electroencephalography (EEG) scans, picture and video recording. Several Machine learning techniques have also been used in this regard recently. However, existing works did not focus on the problem of the imbalanced dataset where the percentage of dyslexic participants is much higher compared to non-dyslexic participants, which is expected to be the case for pre-screening among a random population. This paper addresses the imbalanced dataset obtained from dyslexia pre-screening tests and proposes an oversampling and ensemble-based machine learning technique for the detection of Dyslexia. Simulation results show that the proposed approach improves the detection accuracy of the minority class, i.e., dyslexic patients from 80.61% to 83.52%.http://www.sciencedirect.com/science/article/pii/S2405959522000327DyslexiaImbalanced dataEnsemble techniqueMachine learningOversampling
spellingShingle Shahriar Kaisar
Abdullahi Chowdhury
Integrating oversampling and ensemble-based machine learning techniques for an imbalanced dataset in dyslexia screening tests
ICT Express
Dyslexia
Imbalanced data
Ensemble technique
Machine learning
Oversampling
title Integrating oversampling and ensemble-based machine learning techniques for an imbalanced dataset in dyslexia screening tests
title_full Integrating oversampling and ensemble-based machine learning techniques for an imbalanced dataset in dyslexia screening tests
title_fullStr Integrating oversampling and ensemble-based machine learning techniques for an imbalanced dataset in dyslexia screening tests
title_full_unstemmed Integrating oversampling and ensemble-based machine learning techniques for an imbalanced dataset in dyslexia screening tests
title_short Integrating oversampling and ensemble-based machine learning techniques for an imbalanced dataset in dyslexia screening tests
title_sort integrating oversampling and ensemble based machine learning techniques for an imbalanced dataset in dyslexia screening tests
topic Dyslexia
Imbalanced data
Ensemble technique
Machine learning
Oversampling
url http://www.sciencedirect.com/science/article/pii/S2405959522000327
work_keys_str_mv AT shahriarkaisar integratingoversamplingandensemblebasedmachinelearningtechniquesforanimbalanceddatasetindyslexiascreeningtests
AT abdullahichowdhury integratingoversamplingandensemblebasedmachinelearningtechniquesforanimbalanceddatasetindyslexiascreeningtests