Dealing with Randomness and Concept Drift in Large Datasets

Data-driven solutions to societal challenges continue to bring new dimensions to our daily lives. For example, while good-quality education is a well-acknowledged foundation of sustainable development, innovation and creativity, variations in student attainment and general performance remain commonp...

Full description

Bibliographic Details
Main Authors: Kassim S. Mwitondi, Raed A. Said
Format: Article
Language:English
Published: MDPI AG 2021-07-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/6/7/77
_version_ 1797527291989852160
author Kassim S. Mwitondi
Raed A. Said
author_facet Kassim S. Mwitondi
Raed A. Said
author_sort Kassim S. Mwitondi
collection DOAJ
description Data-driven solutions to societal challenges continue to bring new dimensions to our daily lives. For example, while good-quality education is a well-acknowledged foundation of sustainable development, innovation and creativity, variations in student attainment and general performance remain commonplace. Developing data -driven solutions hinges on two fronts-technical and application. The former relates to the modelling perspective, where two of the major challenges are the impact of data randomness and general variations in definitions, typically referred to as concept drift in machine learning. The latter relates to devising data-driven solutions to address real-life challenges such as identifying potential triggers of pedagogical performance, which aligns with the Sustainable Development Goal (SDG) #4-Quality Education. A total of 3145 pedagogical data points were obtained from the central data collection platform for the United Arab Emirates (UAE) Ministry of Education (MoE). Using simple data visualisation and machine learning techniques via a generic algorithm for sampling, measuring and assessing, the paper highlights research pathways for educationists and data scientists to attain unified goals in an interdisciplinary context. Its novelty derives from embedded capacity to address data randomness and concept drift by minimising modelling variations and yielding consistent results across samples. Results show that intricate relationships among data attributes describe the invariant conditions that practitioners in the two overlapping fields of data science and education must identify.
first_indexed 2024-03-10T09:41:51Z
format Article
id doaj.art-264a011ce4c4412ebf3da1ea617ebae8
institution Directory Open Access Journal
issn 2306-5729
language English
last_indexed 2024-03-10T09:41:51Z
publishDate 2021-07-01
publisher MDPI AG
record_format Article
series Data
spelling doaj.art-264a011ce4c4412ebf3da1ea617ebae82023-11-22T03:34:02ZengMDPI AGData2306-57292021-07-01677710.3390/data6070077Dealing with Randomness and Concept Drift in Large DatasetsKassim S. Mwitondi0Raed A. Said1Industry & Innovation Research Institute, College of Business, Technology & Engineering, Sheffield Hallam University, 9410 Cantor Building, City Campus, 153 Arundel Street, Sheffield S1 2NU, UKFaculty of Management, Canadian University Dubai, Al Safa Street-Al Wasl, City Walk Mall, Dubai P.O. Box 415053, United Arab EmiratesData-driven solutions to societal challenges continue to bring new dimensions to our daily lives. For example, while good-quality education is a well-acknowledged foundation of sustainable development, innovation and creativity, variations in student attainment and general performance remain commonplace. Developing data -driven solutions hinges on two fronts-technical and application. The former relates to the modelling perspective, where two of the major challenges are the impact of data randomness and general variations in definitions, typically referred to as concept drift in machine learning. The latter relates to devising data-driven solutions to address real-life challenges such as identifying potential triggers of pedagogical performance, which aligns with the Sustainable Development Goal (SDG) #4-Quality Education. A total of 3145 pedagogical data points were obtained from the central data collection platform for the United Arab Emirates (UAE) Ministry of Education (MoE). Using simple data visualisation and machine learning techniques via a generic algorithm for sampling, measuring and assessing, the paper highlights research pathways for educationists and data scientists to attain unified goals in an interdisciplinary context. Its novelty derives from embedded capacity to address data randomness and concept drift by minimising modelling variations and yielding consistent results across samples. Results show that intricate relationships among data attributes describe the invariant conditions that practitioners in the two overlapping fields of data science and education must identify.https://www.mdpi.com/2306-5729/6/7/77artificial neural networks (ANNs)Big Dataconcept driftdata sciencesupervised modellingsustainable development goals
spellingShingle Kassim S. Mwitondi
Raed A. Said
Dealing with Randomness and Concept Drift in Large Datasets
Data
artificial neural networks (ANNs)
Big Data
concept drift
data science
supervised modelling
sustainable development goals
title Dealing with Randomness and Concept Drift in Large Datasets
title_full Dealing with Randomness and Concept Drift in Large Datasets
title_fullStr Dealing with Randomness and Concept Drift in Large Datasets
title_full_unstemmed Dealing with Randomness and Concept Drift in Large Datasets
title_short Dealing with Randomness and Concept Drift in Large Datasets
title_sort dealing with randomness and concept drift in large datasets
topic artificial neural networks (ANNs)
Big Data
concept drift
data science
supervised modelling
sustainable development goals
url https://www.mdpi.com/2306-5729/6/7/77
work_keys_str_mv AT kassimsmwitondi dealingwithrandomnessandconceptdriftinlargedatasets
AT raedasaid dealingwithrandomnessandconceptdriftinlargedatasets