Adaptive and Efficient Streaming Time Series Forecasting with Lambda Architecture and Spark

被引:11
作者
Pandya, Arjun [1 ]
Odunsi, Oluwatobiloba [1 ]
Liu, Chen [2 ]
Cuzzocrea, Alfredo [3 ]
Wang, Jianwu [1 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Informat Syst, Baltimore, MD 21228 USA
[2] North China Univ Technol, Beijing, Peoples R China
[3] Univ Calabria, iDEA Lab, Calabria, Italy
来源
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2020年
关键词
Time Series Forecasting; Vector Auto Regression (VAR); Concept Drift; Lambda Architecture; Spark; MODEL;
D O I
10.1109/BigData50022.2020.9377947
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rise of the Internet of Things (IoT) devices and the streaming platform has tremendously increased the data in motion or streaming data. It incorporates a wide variety of data, for example, social media posts, online gamers in-game activities, mobile or web application logs, online e-commerce transactions, financial trading, or geospatial services. Accurate and efficient forecasting based on real-time data is a critical part of the operation in areas like energy & utility consumption, healthcare, industrial production, supply chain, weather forecasting, financial trading, agriculture, etc. Statistical time series forecasting methods like Autoregression (AR), Autoregressive integrated moving average (ARIMA), and Vector Autoregression (VAR), face the challenge of concept drift in the streaming data, i.e., the properties of the stream may change over time. Another challenge is the efficiency of the system to update the Machine Learning (ML) models which are based on these algorithms to tackle the concept drift. In this paper, we propose a novel framework to tackle both of these challenges. The challenge of adaptability is addressed by applying the Lambda architecture to forecast future state based on three approaches simultaneously: batch (historic) data-based prediction, streaming (real-time) data-based prediction, and hybrid prediction by combining the first two. To address the challenge of efficiency, we implement a distributed VAR algorithm on top of the Apache Spark big data platform. To evaluate our framework, we conducted experiments on streaming time series forecasting with four types of data sets of experiments: data without drift (no drift), data with gradual drift, data with abrupt drift and data with mixed drift. The experiments show the differences of our three forecasting approaches in terms of accuracy and adaptability.
引用
收藏
页码:5182 / 5190
页数:9
相关论文
共 17 条
  • [1] [Anonymous], 2000, TIME SERIES FORECAST, DOI DOI 10.1201/9781420036206/TIME-SERIES-FORECASTING-CHRIS-CHATFIELD
  • [2] Baier L., 2020, WI2020 Zentrale Tracks, P210
  • [3] Bifet A., 2011, TECH REP
  • [4] LAG ORDER AND CRITICAL-VALUES OF THE AUGMENTED DICKEY-FULLER TEST
    CHEUNG, YW
    LAI, KS
    [J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 1995, 13 (03) : 277 - 280
  • [5] THE EFFECTIVENESS OF ANTITERRORISM POLICIES - A VECTOR-AUTOREGRESSION-INTERVENTION ANALYSIS
    ENDERS, W
    SANDLER, T
    [J]. AMERICAN POLITICAL SCIENCE REVIEW, 1993, 87 (04) : 829 - 844
  • [6] Major Technical Advancements in Apache Hive
    Huai, Yin
    Chauhan, Ashutosh
    Gates, Alan
    Hagleitner, Gunther
    Hanson, Eric N.
    O'Malley, Owen
    Pandey, Jitendra
    Yuan, Yuan
    Lee, Rubao
    Zhang, Xiaodong
    [J]. SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1235 - 1246
  • [7] The Design and Implementation of Vector Autoregressive Model and Structural Vector Autoregressive Model Based on Spark
    Li, Tao
    Li, Xueyu
    Zhang, Xu
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM), 2017, : 386 - 394
  • [8] Applying spark based machine learning model on streaming big data for health status prediction
    Nair, Lekha R.
    Shetty, Sujala D.
    Shetty, Siddhanth D.
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2018, 65 : 393 - 399
  • [9] Nason G.P., 2006, STATIONARY NONSTATIO
  • [10] On the use of URLs and hashtags in age prediction of Twitter users
    Pandya, Abhinay
    Oussalah, Mourad
    Monachesi, Paola
    Kostakos, Panos
    Loven, Lauri
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 62 - 69