Estimating Multilevel Models on Data Streams

被引:0
作者
L. Ippel
M. C. Kaptein
J. K. Vermunt
机构
[1] Maastricht University,Institute of Data Science
[2] Tilburg University,undefined
来源
Psychometrika | 2019年 / 84卷
关键词
Data streams; expectation maximization algorithm; multilevel models; machine (online) learning; SEMA; nested data;
D O I
暂无
中图分类号
学科分类号
摘要
Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or “row-by-row”). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.
引用
收藏
页码:41 / 64
页数:23
相关论文
共 50 条
[41]   Novelty detection in data streams [J].
Faria, Elaine R. ;
Goncalves, Isabel J. C. R. ;
de Carvalho, Andre C. P. L. F. ;
Gama, Joao .
ARTIFICIAL INTELLIGENCE REVIEW, 2016, 45 (02) :235-269
[42]   Correlation Clustering in Data Streams [J].
Ahn, Kook Jin ;
Cormode, Graham ;
Guha, Sudipto ;
McGregor, Andrew ;
Wirth, Anthony .
ALGORITHMICA, 2021, 83 (07) :1980-2017
[43]   A Method For Evolving Data Streams [J].
Wankhade, Kapil ;
Hasan, Tasneem ;
Thool, Ravindra .
2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, :1619-1622
[44]   Warping the time on data streams [J].
Capitani, Paolo ;
Ciaccia, Paolo .
DATA & KNOWLEDGE ENGINEERING, 2007, 62 (03) :438-458
[45]   A Study on Imbalanced Data Streams [J].
Aminian, Ehsan ;
Ribeiro, Rita P. ;
Gama, Joao .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 :380-389
[46]   Correlation Clustering in Data Streams [J].
Kook Jin Ahn ;
Graham Cormode ;
Sudipto Guha ;
Andrew McGregor ;
Anthony Wirth .
Algorithmica, 2021, 83 :1980-2017
[47]   Novelty detection in data streams [J].
Elaine R. Faria ;
Isabel J. C. R. Gonçalves ;
André C. P. L. F. de Carvalho ;
João Gama .
Artificial Intelligence Review, 2016, 45 :235-269
[48]   Variance component testing in multilevel models [J].
Berkhof, J ;
Snijders, TAB .
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2001, 26 (02) :133-152
[49]   Multilevel models in international business research [J].
Mark F Peterson ;
Jean-Luc Arregle ;
Xavier Martin .
Journal of International Business Studies, 2012, 43 :451-457
[50]   Multilevel Models as a Tool for Research in Education [J].
Murillo Torrecilla, F. Javier .
MAGIS-REVISTA INTERNACIONAL DE INVESTIGACION EN EDUCACION, 2008, 1 (01) :45-62