Simulation of electricity consumption data using multiple artificial intelligence models and cross validation techniques

Worldwide, electricity production exceeds its consumption which leads to wasted financial and energy resources. Machine learning models can be utilized to predict the future consumption to avoid these significant losses. This paper presents the data for the monthly electricity consumption on the com...

Full description

Bibliographic Details
Main Authors: Mariam Hosny, Omnia Abu Waraga, Manar Abu Talib, Mohamed Abdallah
Format: Article
Language:English
Published: Elsevier 2023-12-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340923007898
Description
Summary:Worldwide, electricity production exceeds its consumption which leads to wasted financial and energy resources. Machine learning models can be utilized to predict the future consumption to avoid these significant losses. This paper presents the data for the monthly electricity consumption on the community level during May 2017–December 2019 in Dubai, United Arab Emirates. It was acquired from Dubai Pulse, an online repository containing consumption data from Dubai Electricity and Water Authority which provides utility services to the Emirate. Multiple parameters, such as population and number of buildings, were acquired from Dubai Statistics Center in addition to temperature which was obtained from Dubai International Airport. Additional features, such as expatriate ratio, number of customers, and building occupancy, were computed from the available data and utilized to generate a dataset towards accurate prediction. Various linear regression variants, support vector machines, decision tree models, ensemble models, and neural networks were implemented to forecast electricity consumption. The models were trained on two different formats of the same dataset, which were generated by sorting the data with respect to time, named as temporally ordered dataset, and by randomly dividing the data, labelled as randomly split dataset. In addition, the dependence of the models on the amount of data was identified by varying the size of the testing data. Moreover, two cross-validation (CV) procedures, namely rolling CV method and moving CV method, were applied to assess the reliability of the models. All analyses were evaluated by utilizing several performance metrics, namely root mean squared error, coefficient of determination, i.e., R2, 10-fold CV score, mean absolute error, median absolute error, and computational time. Furthermore, this data could be utilized to analyze the effect of coronavirus disease 2019 (COVID-19) prevention measures in Dubai on electricity usage as well as evaluate the consumption patterns at the consumer level.
ISSN:2352-3409