Dealing With Data Streams An Online, Row-by-Row, Estimation Tutorial

被引:4
作者
Ippel, Lianne [1 ]
Kaptein, Maurits [1 ]
Vermunt, Jeroen [1 ]
机构
[1] Tilburg Univ, Methodol & Stat, Warandelaan 2,Postbus 90153, NL-5000 LL Tilburg, Netherlands
关键词
Big Data; data streams; machine learning; online learning; Stochastic Gradient Descent; MAXIMUM-LIKELIHOOD; DESIGN; EM;
D O I
10.1027/1614-2241/a000116
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Novel technological advances allow distributed and automatic measurement of human behavior. While these technologies provide exciting new research opportunities, they also provide challenges: datasets collected using new technologies grow increasingly large, and in many applications the collected data are continuously augmented. These data streams make the standard computation of well-known estimators inefficient as the computation has to be repeated each time a new data point enters. In this tutorial paper, we detail online learning, an analysis method that facilitates the efficient analysis of Big Data and continuous data streams. We illustrate how common analysis methods can be adapted for use with Big Data using an online, or "row-by-row," processing approach. We present several simple ( and exact) examples of the online estimation and discuss Stochastic Gradient Descent as a general (approximate) approach to estimate more complex models. We end this article with a discussion of the methodological challenges that remain.
引用
收藏
页码:124 / 138
页数:15
相关论文
共 50 条
  • [21] Personalized online ensemble machine learning with applications for dynamic data streams
    Malenica, Ivana
    Phillips, Rachael V. V.
    Chambaz, Antoine
    Hubbard, Alan E. E.
    Pirracchio, Romain
    van der Laan, Mark J. J.
    STATISTICS IN MEDICINE, 2023, 42 (07) : 1013 - 1044
  • [22] Online Query by Committee for Active Learning from Drifting Data Streams
    Krawczyk, Bartosz
    Wozniak, Michal
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2120 - 2127
  • [23] Dealing with Data Streams: Complex Event Processing vs. Data Stream Mining
    Lange, Moritz
    Koschel, Arne
    Astrova, Irina
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2020, PART IV, 2020, 12252 : 3 - 14
  • [24] Online Bagging and Boosting for Imbalanced Data Streams
    Wang, Boyu
    Pineau, Joelle
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3353 - 3366
  • [25] Online Learning and Prediction of Data Streams using Dynamically Evolving Fuzzy Approach
    Baruah, Rashmi Dutta
    Angelov, Plamen
    2013 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ - IEEE 2013), 2013,
  • [26] MIDAS: Open-source framework for distributed online analysis of data streams
    Henelius, Andreas
    Torniainen, Jar
    SOFTWAREX, 2018, 7 : 156 - 161
  • [27] Fast GPU-beamforming of Row-Column Addressed Probe Data
    Stuart, Matthias Bo
    Jensen, Patrick Moller
    Olsen, Julian Thomas Reckeweg
    Kristensen, Alexander Borch
    Schou, Mikkel
    Dammann, Bernd
    Sorensen, Hans Henrik Brandenborg
    Jensen, Jorgen Arendt
    2019 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IUS), 2019, : 1497 - 1500
  • [28] Online active learning for human activity recognition from sensory data streams
    Mohamad, Saad
    Sayed-Mouchaweh, Moamar
    Bouchachia, Abdelhamid
    NEUROCOMPUTING, 2020, 390 (390) : 341 - 358
  • [29] AFQN: approximate Qn estimation in data streams
    Italo Epicoco
    Catiuscia Melle
    Massimo Cafaro
    Marco Pulimeno
    Applied Intelligence, 2022, 52 : 5082 - 5099
  • [30] AFQN: approximate Qn estimation in data streams
    Epicoco, Italo
    Melle, Catiuscia
    Cafaro, Massimo
    Pulimeno, Marco
    APPLIED INTELLIGENCE, 2022, 52 (05) : 5082 - 5099