Factor Model-Based Large Covariance Estimation from Streaming Data Using a Knowledge-Based Sketch Matrix

被引:0
作者
Tan, Xiao [1 ]
Wang, Zhaoyang [1 ]
Qian, Hao [2 ]
Zhou, Jun [2 ]
Duan, Peibo [3 ]
Shen, Dian [1 ]
Wang, Meng [4 ]
Wang, Beilun [1 ]
机构
[1] Southeast Univ, Nanjing, Peoples R China
[2] Ant Grp, Hangzhou, Peoples R China
[3] Monash Univ, Melbourne, Vic, Australia
[4] Tongji Univ, Shanghai, Peoples R China
来源
PROCEEDINGS OF THE 33RD ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Covariance Matrix; Streaming Data; Sketching Algorithm; NUMBER;
D O I
10.1145/3627673.3679820
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Covariance matrix estimation is an important problem in statistics, with wide applications in finance, neuroscience, meteorology, oceanography, and other fields. However, when the data are high-dimensional and constantly generated and updated in a streaming fashion, the covariance matrix estimation faces huge challenges, including the curse of dimensionality and limited memory space. The existing methods either assume sparsity, ignoring any possible common factor among the variables, or obtain poor performance in recovering the covariance matrix directly from sketched data. To address these issues, we propose a novel method - KEEF: Knowledge-based Time and Memory Efficient Covariance Estimator in Factor Model and its extended variation. Our method leverages historical data to train a knowledge-based sketch matrix, which is used to accelerate the factor analysis of streaming data and directly estimates the covariance matrix from the sketched data. We provide theoretical guarantees, showing the advantages of our method in terms of time and space complexity, as well as accuracy. We conduct extensive experiments on synthetic and real-world data, comparing KEEF with several state-of-the-art methods, demonstrating the superior performance of our method.
引用
收藏
页码:2210 / 2219
页数:10
相关论文
共 34 条
[11]  
Dasarathy Gautam, 2013, ARXIV
[12]  
El Karoui N., 2010, High-dimensionality effects in the markowitz problem and other quadratic programs with linear constraints: Risk underestimation
[13]   An overview of the estimation of large covariance and precision matrices [J].
Fan, Jianqing ;
Liao, Yuan ;
Liu, Han .
ECONOMETRICS JOURNAL, 2016, 19 (01) :C1-C32
[14]   Large covariance estimation by thresholding principal orthogonal complements [J].
Fan, Jianqing ;
Liao, Yuan ;
Mincheva, Martina .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2013, 75 (04) :603-680
[15]   HIGH-DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS [J].
Fan, Jianqing ;
Liao, Yuan ;
Mincheva, Martina .
ANNALS OF STATISTICS, 2011, 39 (06) :3320-3356
[16]   High dimensional covariance matrix estimation using a factor model [J].
Fan, Jianqing ;
Fan, Yingying ;
Lv, Jinchi .
JOURNAL OF ECONOMETRICS, 2008, 147 (01) :186-197
[17]   FREQUENT DIRECTIONS: SIMPLE AND DETERMINISTIC MATRIX SKETCHING [J].
Ghashami, Mina ;
Liberty, Edo ;
Phillips, Jeff M. ;
Woodruff, David P. .
SIAM JOURNAL ON COMPUTING, 2016, 45 (05) :1762-1792
[18]  
Han Fang, 2013, ARXIV
[19]  
Jaisankar Vijay, 2022, AI511 HOMELOAN 2022
[20]   NONPARAMETRIC EIGENVALUE-REGULARIZED PRECISION OR COVARIANCE MATRIX ESTIMATOR [J].
Lam, Clifford .
ANNALS OF STATISTICS, 2016, 44 (03) :928-953