Preptimize: Automation of Time Series Data Preprocessing and Forecasting

被引:0
作者
Usmani, Mehak [1 ]
Memon, Zulfiqar Ali [1 ]
Zulfiqar, Adil [2 ]
Qureshi, Rizwan [3 ]
机构
[1] Natl Univ Comp & Emerging Sci, Fast Sch Comp, Karachi 65200, Pakistan
[2] Natl Univ Comp & Emerging Sci, Dept Elect Engn, Faisalabad Campus, Faisalabad 38000, Pakistan
[3] Chinese Acad Sci, Hong Kong Inst Sci & Innovat, Ctr Regenerat Med & Hlth, Sci Pk, Hong Kong 999077, Peoples R China
关键词
automation; optimization; time series; univariate and multivariate; statistical techniques; machine learning; preprocessing; forecasting; interpolation; single and multiple imputation; missing data; NETWORKS;
D O I
10.3390/a17080332
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Time series analysis is pivotal for business and financial decision making, especially with the increasing integration of the Internet of Things (IoT). However, leveraging time series data for forecasting requires extensive preprocessing to address challenges such as missing values, heteroscedasticity, seasonality, outliers, and noise. Different approaches are necessary for univariate and multivariate time series, Gaussian and non-Gaussian time series, and stationary versus non-stationary time series. Handling missing data alone is complex, demanding unique solutions for each type. Extracting statistical features, identifying data quality issues, and selecting appropriate cleaning and forecasting techniques require significant effort, time, and expertise. To streamline this process, we propose an automated strategy called Preptimize, which integrates statistical and machine learning techniques and recommends prediction model blueprints, suggesting the most suitable approaches for a given dataset as an initial step towards further analysis. Preptimize reads a sample from a large dataset and recommends the blueprint model based on optimization, making it easy to use even for non-experts. The results of various experiments indicated that Preptimize either outperformed or had comparable performance to benchmark models across multiple sectors, including stock prices, cryptocurrency, and power consumption prediction. This demonstrates the framework's effectiveness in recommending suitable prediction models for various time series datasets, highlighting its broad applicability across different domains in time series forecasting.
引用
收藏
页数:25
相关论文
共 46 条
  • [1] A novel approach based on combining deep learning models with statistical methods for COVID-19 time series forecasting
    Abbasimehr, Hossein
    Paki, Reza
    Bahrini, Aram
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (04) : 3135 - 3149
  • [2] Agiakloglou C., 1992, Journal of Time Series Analysis, V13, P471, DOI [10.1111/j.1467-9892.1992.tb00121.x, DOI 10.1111/J.1467-9892.1992.TB00121.X]
  • [3] Hybridization of evolutionary Levenberg-Marquardt neural networks and data pre-processing for stock market prediction
    Asadi, Shahrokh
    Hadavandi, Esmaeil
    Mehmanpazir, Farhad
    Nakhostin, Mohammad Masoud
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 35 : 245 - 258
  • [4] Biessmann F, 2019, J MACH LEARN RES, V20
  • [5] Automated Data Pre-processing via Meta-learning
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Wrembel, Robert
    [J]. MODEL AND DATA ENGINEERING, 2016, 9893 : 194 - 208
  • [6] Brown T. A., 2006, Confirmatory factor analysis for applied research
  • [7] Toward automated machine learning in vibrational spectroscopy: Use and settings of genetic algorithms for pre-processing and regression optimization
    Brunel, Benjamin
    Alsamad, Fatima
    Piot, Olivier
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2021, 219
  • [8] Chauhan Karansingh, 2020, 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA). Proceedings, P205, DOI 10.1109/ICIMIA48430.2020.9074859
  • [9] DAEMON: Unsupervised Anomaly Detection and Interpretation for Multivariate Time Series
    Chen, Xuanhao
    Deng, Liwei
    Huang, Feiteng
    Zhang, Chengwei
    Zhang, Zongquan
    Zhao, Yan
    Zheng, Kai
    [J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2225 - 2230
  • [10] Cryer J.D., 2008, Time Series Analysis: With Applications in R, VVolume 2, P31