Data pipeline for real-time energy consumption data management and prediction

被引:2
作者
Im, Jeonghwan [1 ]
Lee, Jaekyu [1 ]
Lee, Somin [2 ]
Kwon, Hyuk-Yoon [1 ]
机构
[1] Seoul Natl Univ Sci & Technol, Grad Sch Data Sci, Seoul, South Korea
[2] Seoul Natl Univ Sci & Technol, Dept Global Technol Management, Seoul, South Korea
来源
FRONTIERS IN BIG DATA | 2024年 / 7卷
基金
新加坡国家研究基金会;
关键词
energy consumption; MLOps-centric data pipeline; time-series forecasting; real-time data pipeline; scalable pipeline; TERM; MODEL;
D O I
10.3389/fdata.2024.1308236
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing utilization of data in various industries and applications, constructing an efficient data pipeline has become crucial. In this study, we propose a machine learning operations-centric data pipeline specifically designed for an energy consumption management system. This pipeline seamlessly integrates the machine learning model with real-time data management and prediction capabilities. The overall architecture of our proposed pipeline comprises several key components, including Kafka, InfluxDB, Telegraf, Zookeeper, and Grafana. To enable accurate energy consumption predictions, we adopt two time-series prediction models, long short-term memory (LSTM), and seasonal autoregressive integrated moving average (SARIMA). Our analysis reveals a clear trade-off between speed and accuracy, where SARIMA exhibits faster model learning time while LSTM outperforms SARIMA in prediction accuracy. To validate the effectiveness of our pipeline, we measure the overall processing time by optimizing the configuration of Telegraf, which directly impacts the load in the pipeline. The results are promising, as our pipeline achieves an average end-to-end processing time of only 0.39 s for handling 10,000 data records and an impressive 1.26 s when scaling up to 100,000 records. This indicates 30.69-90.88 times faster processing compared to the existing Python-based approach. Additionally, when the number of records increases by ten times, the increased overhead is reduced by 3.07 times. This verifies that the proposed pipeline exhibits an efficient and scalable structure suitable for real-time environments.
引用
收藏
页数:8
相关论文
共 31 条
[1]  
Amarasinghe K, 2017, PROC IEEE INT SYMP, P1483, DOI 10.1109/ISIE.2017.8001465
[2]   Short-term hourly load forecasting using time-series modeling with peak load estimation capability [J].
Amjady, N .
IEEE TRANSACTIONS ON POWER SYSTEMS, 2001, 16 (03) :498-505
[3]  
Burnham K P., 2004, Model selection and multimodel inference. A practical information-theoretic approach, P2
[4]  
Chujai Pasapitch, 2013, IMECS 2013 Proceedings of International Multiconference of Engineers and Computer Scientists, P295
[5]   Greek long-term energy consumption prediction using artificial neural networks [J].
Ekonomou, L. .
ENERGY, 2010, 35 (02) :512-517
[6]   Electric load forecasting by the SVR model with differential empirical mode decomposition and auto regression [J].
Fan, Guo-Feng ;
Peng, Li-Ling ;
Hong, Wei-Chiang ;
Sun, Fan .
NEUROCOMPUTING, 2016, 173 :958-970
[7]   A hybrid method based on wavelet, ANN and ARIMA model for short- term load forecasting [J].
Fard, Abdollah Kavousi ;
Akbari-Zadeh, Mohammad-Reza .
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2014, 26 (02) :167-182
[8]   Real-time Data Infrastructure at Uber [J].
Fu, Yupeng ;
Soman, Chinmay .
SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, :2503-2516
[9]   Short term electricity forecasting using individual smart meter data [J].
Gajowniczek, Krzysztof ;
Zabkowski, Tomasz .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 18TH ANNUAL CONFERENCE, KES-2014, 2014, 35 :589-597
[10]  
Gogineni VR, 2015, International Journal of Electrical and Computer Engineering (IJECE), V5, P685, DOI [10.11591/ijece.v5i4.pp685-694, 10.11591/ijece.v5i4.pp685-694, DOI 10.11591/IJECE.V5I4.PP685-694]