Dealing With Data Streams An Online, Row-by-Row, Estimation Tutorial

被引:4
作者
Ippel, Lianne [1 ]
Kaptein, Maurits [1 ]
Vermunt, Jeroen [1 ]
机构
[1] Tilburg Univ, Methodol & Stat, Warandelaan 2,Postbus 90153, NL-5000 LL Tilburg, Netherlands
关键词
Big Data; data streams; machine learning; online learning; Stochastic Gradient Descent; MAXIMUM-LIKELIHOOD; DESIGN; EM;
D O I
10.1027/1614-2241/a000116
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Novel technological advances allow distributed and automatic measurement of human behavior. While these technologies provide exciting new research opportunities, they also provide challenges: datasets collected using new technologies grow increasingly large, and in many applications the collected data are continuously augmented. These data streams make the standard computation of well-known estimators inefficient as the computation has to be repeated each time a new data point enters. In this tutorial paper, we detail online learning, an analysis method that facilitates the efficient analysis of Big Data and continuous data streams. We illustrate how common analysis methods can be adapted for use with Big Data using an online, or "row-by-row," processing approach. We present several simple ( and exact) examples of the online estimation and discuss Stochastic Gradient Descent as a general (approximate) approach to estimate more complex models. We end this article with a discussion of the methodological challenges that remain.
引用
收藏
页码:124 / 138
页数:15
相关论文
共 50 条
  • [31] Finding frequent itemsets over online data streams
    Chang, Joong Hyuk
    Lee, Won Suk
    INFORMATION AND SOFTWARE TECHNOLOGY, 2006, 48 (07) : 606 - 618
  • [32] Novelty Detection and Online Learning for Chunk Data Streams
    Wang, Yi
    Ding, Yi
    He, Xiangjian
    Fan, Xin
    Lin, Chi
    Li, Fengqi
    Wang, Tianzhu
    Luo, Zhongxuan
    Luo, Jiebo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (07) : 2400 - 2412
  • [33] Online Unsupervised Neural-Gas Learning Method for Infinite Data Streams
    Bouguelia, Mohamed-Rafik
    Belaid, Yolande
    Belaid, Abdel
    PATTERN RECOGNITION APPLICATIONS AND METHODS, ICPRAM 2013, 2015, 318 : 57 - 70
  • [34] Online Estimation for Functional Data
    Yang, Ying
    Yao, Fang
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (543) : 1630 - 1644
  • [35] Online Meta-Forest for Regression Data Streams
    Shaker, Ammar
    Gartner, Christoph
    He, Xiao
    Yu, Shujian
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [36] Online Learning for Data Streams With Incomplete Features and Labels
    You, Dianlong
    Yan, Huigui
    Xiao, Jiawei
    Chen, Zhen
    Wu, Di
    Shen, Limin
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (09) : 4820 - 4834
  • [37] Online Detection of Patterns in Semantic Trajectory Data Streams
    Roganovic, Milos B.
    Stojanovic, Dragan H.
    2013 11TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS IN MODERN SATELLITE, CABLE AND BROADCASTING SERVICES (TELSIKS), VOLS 1 AND 2, 2013, : 575 - 578
  • [38] Dynamic adaptation of online ensembles for drifting data streams
    M. Kehinde Olorunnimbe
    Herna L. Viktor
    Eric Paquet
    Journal of Intelligent Information Systems, 2018, 50 : 291 - 313
  • [39] An Online Robust Support Vector Regression for Data Streams
    Yu, Hang
    Lu, Jie
    Zhang, Guangquan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (01) : 150 - 163
  • [40] Online Learning Strategies for Classification of Static Data Streams
    Millan-Giraldo, M.
    Sanchez, J. S.
    DISTANCE LEARNING, MULTIMEDIA AND VIDEO TECHNOLOGIES, 2008, : 39 - 44