Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark

Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal and effect...

Full description

Bibliographic Details
Main Authors:	Ameema Zainab, Ali Ghrayeb, Haitham Abu-Rub, Shady S. Refaat, Othmane Bouhali
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Apache spark concurrent computing load forecasting parallel processing resource management
Online Access:	https://ieeexplore.ieee.org/document/9400851/

_version_	1819296477340499968
author	Ameema Zainab Ali Ghrayeb Haitham Abu-Rub Shady S. Refaat Othmane Bouhali
author_facet	Ameema Zainab Ali Ghrayeb Haitham Abu-Rub Shady S. Refaat Othmane Bouhali
author_sort	Ameema Zainab
collection	DOAJ
description	Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal and effective use of the available computational resources can be attained by maximizing the effective utilization of the cluster nodes. Parallel computing is demanded to allow for optimal resource utilization in dealing with smart grid big data. In this paper, a master-slave parallel computing paradigm is utilized and experimented with for load forecasting in a multi-AMI environment. The paper proposes a concurrent job scheduling algorithm in a multi-energy data source environment using Apache Spark. An efficient resource utilization strategy is proposed for submitting multiple Spark jobs to reduce job completion time. The optimal value of clustering is used in this paper to cluster the data into groups to be able to reduce the computational time additionally. Multiple tree-based machine learning algorithms are tested with parallel computation to evaluate the performance with tunable parameters on a real-world dataset. One thousand distribution transformers’ real data from Spain for three years are used to demonstrate the performance of the proposed methodology with a trade-off between accuracy and processing time.
first_indexed	2024-12-24T04:58:44Z
format	Article
id	doaj.art-3f644369c7c2466bbd3ff57a27d257ab
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-24T04:58:44Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-3f644369c7c2466bbd3ff57a27d257ab2022-12-21T17:14:18ZengIEEEIEEE Access2169-35362021-01-019573725738410.1109/ACCESS.2021.30726099400851Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache SparkAmeema Zainab0https://orcid.org/0000-0002-3754-4162Ali Ghrayeb1https://orcid.org/0000-0002-6808-5886Haitham Abu-Rub2https://orcid.org/0000-0001-8687-3942Shady S. Refaat3https://orcid.org/0000-0001-9392-6141Othmane Bouhali4Department of Electrical and Computer Engineering, Texas A&xM University, College Station, TX, USADepartment of Electrical and Computer Engineering, Texas A&M University at Qatar, Doha, QatarDepartment of Electrical and Computer Engineering, Texas A&M University at Qatar, Doha, QatarDepartment of Electrical and Computer Engineering, Texas A&M University at Qatar, Doha, QatarResearch Computing, Texas A&M University at Qatar, Doha, QatarMachine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal and effective use of the available computational resources can be attained by maximizing the effective utilization of the cluster nodes. Parallel computing is demanded to allow for optimal resource utilization in dealing with smart grid big data. In this paper, a master-slave parallel computing paradigm is utilized and experimented with for load forecasting in a multi-AMI environment. The paper proposes a concurrent job scheduling algorithm in a multi-energy data source environment using Apache Spark. An efficient resource utilization strategy is proposed for submitting multiple Spark jobs to reduce job completion time. The optimal value of clustering is used in this paper to cluster the data into groups to be able to reduce the computational time additionally. Multiple tree-based machine learning algorithms are tested with parallel computation to evaluate the performance with tunable parameters on a real-world dataset. One thousand distribution transformers’ real data from Spain for three years are used to demonstrate the performance of the proposed methodology with a trade-off between accuracy and processing time.https://ieeexplore.ieee.org/document/9400851/Apache sparkconcurrent computingload forecastingparallel processingresource management
spellingShingle	Ameema Zainab Ali Ghrayeb Haitham Abu-Rub Shady S. Refaat Othmane Bouhali Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark IEEE Access Apache spark concurrent computing load forecasting parallel processing resource management
title	Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
title_full	Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
title_fullStr	Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
title_full_unstemmed	Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
title_short	Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
title_sort	distributed tree based machine learning for short term load forecasting with apache spark
topic	Apache spark concurrent computing load forecasting parallel processing resource management
url	https://ieeexplore.ieee.org/document/9400851/
work_keys_str_mv	AT ameemazainab distributedtreebasedmachinelearningforshorttermloadforecastingwithapachespark AT alighrayeb distributedtreebasedmachinelearningforshorttermloadforecastingwithapachespark AT haithamaburub distributedtreebasedmachinelearningforshorttermloadforecastingwithapachespark AT shadysrefaat distributedtreebasedmachinelearningforshorttermloadforecastingwithapachespark AT othmanebouhali distributedtreebasedmachinelearningforshorttermloadforecastingwithapachespark

Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark

Similar Items