Data pipeline for real-time energy consumption data management and prediction
With the increasing utilization of data in various industries and applications, constructing an efficient data pipeline has become crucial. In this study, we propose a machine learning operations-centric data pipeline specifically designed for an energy consumption management system. This pipeline s...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2024-03-01
|
Series: | Frontiers in Big Data |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fdata.2024.1308236/full |
_version_ | 1797257915327840256 |
---|---|
author | Jeonghwan Im Jaekyu Lee Somin Lee Hyuk-Yoon Kwon |
author_facet | Jeonghwan Im Jaekyu Lee Somin Lee Hyuk-Yoon Kwon |
author_sort | Jeonghwan Im |
collection | DOAJ |
description | With the increasing utilization of data in various industries and applications, constructing an efficient data pipeline has become crucial. In this study, we propose a machine learning operations-centric data pipeline specifically designed for an energy consumption management system. This pipeline seamlessly integrates the machine learning model with real-time data management and prediction capabilities. The overall architecture of our proposed pipeline comprises several key components, including Kafka, InfluxDB, Telegraf, Zookeeper, and Grafana. To enable accurate energy consumption predictions, we adopt two time-series prediction models, long short-term memory (LSTM), and seasonal autoregressive integrated moving average (SARIMA). Our analysis reveals a clear trade-off between speed and accuracy, where SARIMA exhibits faster model learning time while LSTM outperforms SARIMA in prediction accuracy. To validate the effectiveness of our pipeline, we measure the overall processing time by optimizing the configuration of Telegraf, which directly impacts the load in the pipeline. The results are promising, as our pipeline achieves an average end-to-end processing time of only 0.39 s for handling 10,000 data records and an impressive 1.26 s when scaling up to 100,000 records. This indicates 30.69–90.88 times faster processing compared to the existing Python-based approach. Additionally, when the number of records increases by ten times, the increased overhead is reduced by 3.07 times. This verifies that the proposed pipeline exhibits an efficient and scalable structure suitable for real-time environments. |
first_indexed | 2024-04-24T22:45:13Z |
format | Article |
id | doaj.art-e46e76f5d7334b8a9ccd72885f280796 |
institution | Directory Open Access Journal |
issn | 2624-909X |
language | English |
last_indexed | 2024-04-24T22:45:13Z |
publishDate | 2024-03-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Big Data |
spelling | doaj.art-e46e76f5d7334b8a9ccd72885f2807962024-03-18T13:26:27ZengFrontiers Media S.A.Frontiers in Big Data2624-909X2024-03-01710.3389/fdata.2024.13082361308236Data pipeline for real-time energy consumption data management and predictionJeonghwan Im0Jaekyu Lee1Somin Lee2Hyuk-Yoon Kwon3Graduate School of Data Science, Seoul National University of Science and Technology, Seoul, Republic of KoreaGraduate School of Data Science, Seoul National University of Science and Technology, Seoul, Republic of KoreaDepartment of Global Technology Management, Seoul National University of Science and Technology, Seoul, Republic of KoreaGraduate School of Data Science, Seoul National University of Science and Technology, Seoul, Republic of KoreaWith the increasing utilization of data in various industries and applications, constructing an efficient data pipeline has become crucial. In this study, we propose a machine learning operations-centric data pipeline specifically designed for an energy consumption management system. This pipeline seamlessly integrates the machine learning model with real-time data management and prediction capabilities. The overall architecture of our proposed pipeline comprises several key components, including Kafka, InfluxDB, Telegraf, Zookeeper, and Grafana. To enable accurate energy consumption predictions, we adopt two time-series prediction models, long short-term memory (LSTM), and seasonal autoregressive integrated moving average (SARIMA). Our analysis reveals a clear trade-off between speed and accuracy, where SARIMA exhibits faster model learning time while LSTM outperforms SARIMA in prediction accuracy. To validate the effectiveness of our pipeline, we measure the overall processing time by optimizing the configuration of Telegraf, which directly impacts the load in the pipeline. The results are promising, as our pipeline achieves an average end-to-end processing time of only 0.39 s for handling 10,000 data records and an impressive 1.26 s when scaling up to 100,000 records. This indicates 30.69–90.88 times faster processing compared to the existing Python-based approach. Additionally, when the number of records increases by ten times, the increased overhead is reduced by 3.07 times. This verifies that the proposed pipeline exhibits an efficient and scalable structure suitable for real-time environments.https://www.frontiersin.org/articles/10.3389/fdata.2024.1308236/fullenergy consumptionMLOps-centric data pipelinetime-series forecastingreal-time data pipelinescalable pipeline |
spellingShingle | Jeonghwan Im Jaekyu Lee Somin Lee Hyuk-Yoon Kwon Data pipeline for real-time energy consumption data management and prediction Frontiers in Big Data energy consumption MLOps-centric data pipeline time-series forecasting real-time data pipeline scalable pipeline |
title | Data pipeline for real-time energy consumption data management and prediction |
title_full | Data pipeline for real-time energy consumption data management and prediction |
title_fullStr | Data pipeline for real-time energy consumption data management and prediction |
title_full_unstemmed | Data pipeline for real-time energy consumption data management and prediction |
title_short | Data pipeline for real-time energy consumption data management and prediction |
title_sort | data pipeline for real time energy consumption data management and prediction |
topic | energy consumption MLOps-centric data pipeline time-series forecasting real-time data pipeline scalable pipeline |
url | https://www.frontiersin.org/articles/10.3389/fdata.2024.1308236/full |
work_keys_str_mv | AT jeonghwanim datapipelineforrealtimeenergyconsumptiondatamanagementandprediction AT jaekyulee datapipelineforrealtimeenergyconsumptiondatamanagementandprediction AT sominlee datapipelineforrealtimeenergyconsumptiondatamanagementandprediction AT hyukyoonkwon datapipelineforrealtimeenergyconsumptiondatamanagementandprediction |