Dealing With Data Streams An Online, Row-by-Row, Estimation Tutorial

被引:4
作者
Ippel, Lianne [1 ]
Kaptein, Maurits [1 ]
Vermunt, Jeroen [1 ]
机构
[1] Tilburg Univ, Methodol & Stat, Warandelaan 2,Postbus 90153, NL-5000 LL Tilburg, Netherlands
关键词
Big Data; data streams; machine learning; online learning; Stochastic Gradient Descent; MAXIMUM-LIKELIHOOD; DESIGN; EM;
D O I
10.1027/1614-2241/a000116
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Novel technological advances allow distributed and automatic measurement of human behavior. While these technologies provide exciting new research opportunities, they also provide challenges: datasets collected using new technologies grow increasingly large, and in many applications the collected data are continuously augmented. These data streams make the standard computation of well-known estimators inefficient as the computation has to be repeated each time a new data point enters. In this tutorial paper, we detail online learning, an analysis method that facilitates the efficient analysis of Big Data and continuous data streams. We illustrate how common analysis methods can be adapted for use with Big Data using an online, or "row-by-row," processing approach. We present several simple ( and exact) examples of the online estimation and discuss Stochastic Gradient Descent as a general (approximate) approach to estimate more complex models. We end this article with a discussion of the methodological challenges that remain.
引用
收藏
页码:124 / 138
页数:15
相关论文
共 50 条
  • [41] A New hvolving Clustering Algorithm for Online Data Streams
    Bezerra, Clamber Gomes
    Jales Costa, Bruno Sielly
    Guedes, Luiz Affonso
    Angelov, Plamen Parvanov
    PROCEEDINGS OF THE 2016 IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (EAIS), 2016, : 162 - 168
  • [42] Dynamic adaptation of online ensembles for drifting data streams
    Olorunnimbe, M. Kehinde
    Viktor, Herna L.
    Paquet, Eric
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2018, 50 (02) : 291 - 313
  • [43] Online tree-based ensembles and option trees for regression on evolving data streams
    Ikonomovska, Elena
    Gama, Joao
    Dzeroski, Saso
    NEUROCOMPUTING, 2015, 150 : 458 - 470
  • [44] Online estimation and community detection of network point processes for event streams
    Fang G.
    Ward O.G.
    Zheng T.
    Statistics and Computing, 2024, 34 (1)
  • [45] Temporally adaptive estimation of logistic classifiers on data streams
    Anagnostopoulos, Christoforos
    Tasoulis, Dimitris K.
    Adams, Niall M.
    Hand, David J.
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2009, 3 (03) : 243 - 261
  • [46] Temporally adaptive estimation of logistic classifiers on data streams
    Christoforos Anagnostopoulos
    Dimitris K. Tasoulis
    Niall M. Adams
    David J. Hand
    Advances in Data Analysis and Classification, 2009, 3 : 243 - 261
  • [47] Online Ensemble Learning of Data Streams with Gradually Evolved Classes
    Sun, Yu
    Tang, Ke
    Minku, Leandro L.
    Wang, Shuo
    Yao, Xin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (06) : 1532 - 1545
  • [48] Online Clustering for Novelty Detection and Concept Drift in Data Streams
    Garcia, Kemilly Dearo
    Poel, Mannes
    Kok, Joost N.
    de Carvalho, Andre C. P. L. F.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PT II, 2019, 11805 : 448 - 459
  • [49] A Novel Online Ensemble Approach for Concept Drift in Data Streams
    Sidhu, Parneeta
    Bhatia, M. P. S.
    Bindal, Aditya
    2013 IEEE SECOND INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2013, : 550 - 555
  • [50] Robust Sparse Online Learning for Data Streams with Streaming Features
    Chen, Zhong
    He, Yi
    Wu, Di
    Zhan, Huixin
    Sheng, Victor
    Zhang, Kun
    PROCEEDINGS OF THE 2024 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2024, : 181 - 189