Large Covariance Estimation from Streaming Data with Knowledge-Based Sketch Matrix

被引:0
作者
Tan, Xiao [1 ]
Wang, Zhaoyang [1 ]
Wang, Meng [1 ]
Shen, Dian [1 ]
Chen, Weitong [2 ]
Wang, Beilun [1 ]
机构
[1] Southeast Univ, Nanjing, Peoples R China
[2] Adelaide Univ, Adelaide, SA 5005, Australia
来源
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2024, PT 5 | 2024年 / 14854卷
关键词
Covariance Matrix; Streaming Data; Sketching Algorithm;
D O I
10.1007/978-981-97-5569-1_32
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Covariance matrix estimation is an important problem in statistics, with wide applications in finance, neuroscience, meteorology, oceanography, and other fields. However, when the data are high-dimensional and constantly generated and updated in a streaming fashion, the covariance matrix estimation faces huge challenges, including the curse of dimensionality and limited memory space. The existing methods either assume sparsity, ignoring any possible common factor among the variables, or obtain poor performance in recovering the covariance matrix directly from sketched data. To address these issues, we propose a novel method - KEEF: Knowledge-based Time and Memory Efficient Covariance Estimator in Factor Model. Our method leverages historical data to train a knowledge-based sketch matrix, which is used to accelerate the factor analysis of streaming data and directly estimates the covariance matrix from the sketched data. We provide theoretical guarantees, showing the advantages of our method in terms of time and space complexity, as well as accuracy. We conduct extensive experiments on synthetic and real-world data, comparing KEEF with several state-of-the-art methods, demonstrating the superior performance of our method.
引用
收藏
页码:493 / 502
页数:10
相关论文
共 23 条
[11]   Large covariance estimation by thresholding principal orthogonal complements [J].
Fan, Jianqing ;
Liao, Yuan ;
Mincheva, Martina .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2013, 75 (04) :603-680
[12]   FREQUENT DIRECTIONS: SIMPLE AND DETERMINISTIC MATRIX SKETCHING [J].
Ghashami, Mina ;
Liberty, Edo ;
Phillips, Jeff M. ;
Woodruff, David P. .
SIAM JOURNAL ON COMPUTING, 2016, 45 (05) :1762-1792
[13]   NONPARAMETRIC EIGENVALUE-REGULARIZED PRECISION OR COVARIANCE MATRIX ESTIMATOR [J].
Lam, Clifford .
ANNALS OF STATISTICS, 2016, 44 (03) :928-953
[14]   FACTOR MODELING FOR HIGH-DIMENSIONAL TIME SERIES: INFERENCE FOR THE NUMBER OF FACTORS [J].
Lam, Clifford ;
Yao, Qiwei .
ANNALS OF STATISTICS, 2012, 40 (02) :694-726
[15]   SPARSISTENCY AND RATES OF CONVERGENCE IN LARGE COVARIANCE MATRIX ESTIMATION [J].
Lam, Clifford ;
Fan, Jianqing .
ANNALS OF STATISTICS, 2009, 37 (6B) :4254-4278
[16]   Detecting outliers in streaming time series data from ARM distributed sensors [J].
Lu, Yuping ;
Kumar, Jitendra ;
Collier, Nathan ;
Krishna, Bhargavi ;
Langston, Michael A. .
2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, :779-786
[17]  
Mitra R, 2014, Arxiv, DOI arXiv:1403.6195
[18]   Asymptotics of the principal components estimator of large factor models with weakly influential factors [J].
Onatski, Alexei .
JOURNAL OF ECONOMETRICS, 2012, 168 (02) :244-258
[19]  
Rigollet P, 2012, Arxiv, DOI arXiv:1205.1210
[20]   A fast algorithm for the minimum covariance determinant estimator [J].
Rousseeuw, PJ ;
Van Driessen, K .
TECHNOMETRICS, 1999, 41 (03) :212-223