Data pipeline for real-time energy consumption data management and prediction

With the increasing utilization of data in various industries and applications, constructing an efficient data pipeline has become crucial. In this study, we propose a machine learning operations-centric data pipeline specifically designed for an energy consumption management system. This pipeline s...

Full description

Bibliographic Details
Main Authors: Jeonghwan Im, Jaekyu Lee, Somin Lee, Hyuk-Yoon Kwon
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-03-01
Series:Frontiers in Big Data
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fdata.2024.1308236/full
_version_ 1797257915327840256
author Jeonghwan Im
Jaekyu Lee
Somin Lee
Hyuk-Yoon Kwon
author_facet Jeonghwan Im
Jaekyu Lee
Somin Lee
Hyuk-Yoon Kwon
author_sort Jeonghwan Im
collection DOAJ
description With the increasing utilization of data in various industries and applications, constructing an efficient data pipeline has become crucial. In this study, we propose a machine learning operations-centric data pipeline specifically designed for an energy consumption management system. This pipeline seamlessly integrates the machine learning model with real-time data management and prediction capabilities. The overall architecture of our proposed pipeline comprises several key components, including Kafka, InfluxDB, Telegraf, Zookeeper, and Grafana. To enable accurate energy consumption predictions, we adopt two time-series prediction models, long short-term memory (LSTM), and seasonal autoregressive integrated moving average (SARIMA). Our analysis reveals a clear trade-off between speed and accuracy, where SARIMA exhibits faster model learning time while LSTM outperforms SARIMA in prediction accuracy. To validate the effectiveness of our pipeline, we measure the overall processing time by optimizing the configuration of Telegraf, which directly impacts the load in the pipeline. The results are promising, as our pipeline achieves an average end-to-end processing time of only 0.39 s for handling 10,000 data records and an impressive 1.26 s when scaling up to 100,000 records. This indicates 30.69–90.88 times faster processing compared to the existing Python-based approach. Additionally, when the number of records increases by ten times, the increased overhead is reduced by 3.07 times. This verifies that the proposed pipeline exhibits an efficient and scalable structure suitable for real-time environments.
first_indexed 2024-04-24T22:45:13Z
format Article
id doaj.art-e46e76f5d7334b8a9ccd72885f280796
institution Directory Open Access Journal
issn 2624-909X
language English
last_indexed 2024-04-24T22:45:13Z
publishDate 2024-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Big Data
spelling doaj.art-e46e76f5d7334b8a9ccd72885f2807962024-03-18T13:26:27ZengFrontiers Media S.A.Frontiers in Big Data2624-909X2024-03-01710.3389/fdata.2024.13082361308236Data pipeline for real-time energy consumption data management and predictionJeonghwan Im0Jaekyu Lee1Somin Lee2Hyuk-Yoon Kwon3Graduate School of Data Science, Seoul National University of Science and Technology, Seoul, Republic of KoreaGraduate School of Data Science, Seoul National University of Science and Technology, Seoul, Republic of KoreaDepartment of Global Technology Management, Seoul National University of Science and Technology, Seoul, Republic of KoreaGraduate School of Data Science, Seoul National University of Science and Technology, Seoul, Republic of KoreaWith the increasing utilization of data in various industries and applications, constructing an efficient data pipeline has become crucial. In this study, we propose a machine learning operations-centric data pipeline specifically designed for an energy consumption management system. This pipeline seamlessly integrates the machine learning model with real-time data management and prediction capabilities. The overall architecture of our proposed pipeline comprises several key components, including Kafka, InfluxDB, Telegraf, Zookeeper, and Grafana. To enable accurate energy consumption predictions, we adopt two time-series prediction models, long short-term memory (LSTM), and seasonal autoregressive integrated moving average (SARIMA). Our analysis reveals a clear trade-off between speed and accuracy, where SARIMA exhibits faster model learning time while LSTM outperforms SARIMA in prediction accuracy. To validate the effectiveness of our pipeline, we measure the overall processing time by optimizing the configuration of Telegraf, which directly impacts the load in the pipeline. The results are promising, as our pipeline achieves an average end-to-end processing time of only 0.39 s for handling 10,000 data records and an impressive 1.26 s when scaling up to 100,000 records. This indicates 30.69–90.88 times faster processing compared to the existing Python-based approach. Additionally, when the number of records increases by ten times, the increased overhead is reduced by 3.07 times. This verifies that the proposed pipeline exhibits an efficient and scalable structure suitable for real-time environments.https://www.frontiersin.org/articles/10.3389/fdata.2024.1308236/fullenergy consumptionMLOps-centric data pipelinetime-series forecastingreal-time data pipelinescalable pipeline
spellingShingle Jeonghwan Im
Jaekyu Lee
Somin Lee
Hyuk-Yoon Kwon
Data pipeline for real-time energy consumption data management and prediction
Frontiers in Big Data
energy consumption
MLOps-centric data pipeline
time-series forecasting
real-time data pipeline
scalable pipeline
title Data pipeline for real-time energy consumption data management and prediction
title_full Data pipeline for real-time energy consumption data management and prediction
title_fullStr Data pipeline for real-time energy consumption data management and prediction
title_full_unstemmed Data pipeline for real-time energy consumption data management and prediction
title_short Data pipeline for real-time energy consumption data management and prediction
title_sort data pipeline for real time energy consumption data management and prediction
topic energy consumption
MLOps-centric data pipeline
time-series forecasting
real-time data pipeline
scalable pipeline
url https://www.frontiersin.org/articles/10.3389/fdata.2024.1308236/full
work_keys_str_mv AT jeonghwanim datapipelineforrealtimeenergyconsumptiondatamanagementandprediction
AT jaekyulee datapipelineforrealtimeenergyconsumptiondatamanagementandprediction
AT sominlee datapipelineforrealtimeenergyconsumptiondatamanagementandprediction
AT hyukyoonkwon datapipelineforrealtimeenergyconsumptiondatamanagementandprediction