OGM: Online gaussian graphical models on the fly

被引:0
作者
Sijia Yang
Haoyi Xiong
Yunchao Zhang
Yi Ling
Licheng Wang
Kaibo Xu
Zeyi Sun
机构
[1] Beijing University of Posts and Telecommunications,School of Cyber Space Security, State key Laboratory of Switching and Networking
[2] Baidu Research,Big Data Lab
[3] Baidu Inc.,Department of Computer Science
[4] Missouri University of Science and Technology,Mininglamp Academy of Sciences
[5] Mininglamp Technology,undefined
来源
Applied Intelligence | 2022年 / 52卷
关键词
Advanced analytics; Online learning over streaming data; Gaussian graphical models;
D O I
暂无
中图分类号
学科分类号
摘要
Gaussian Graphical Model is widely used to understand the dependencies between variables from high-dimensional data and can enable a wide range of applications such as principal component analysis, discriminant analysis, and canonical analysis. With respect to the streaming nature of big data, we study a novel Online Gaussian Graphical Model (OGM) that can estimate the inverse covariance matrix over the high-dimensional streaming data, in this paper. Specifically, given a small number of samples to initialize the learning process, OGM first estimates a low-rank estimation of inverse covariance matrix; then, when each individual new sample arrives, it updates the estimation of inverse covariance matrix using a low-complexity updating rule, without using the past data and matrix inverse. The significant edges of Gaussian graphical models can be discovered through thresholding the inverse covariance matrices. Theoretical analysis shows the convergence rate of OGM to the true parameters is guaranteed under Bernstein-style with mild conditions. We evaluate OGM using extensive experiments. The evaluation results backup our theory.
引用
收藏
页码:3103 / 3117
页数:14
相关论文
共 100 条
[1]  
Tony Cai T(2016)Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation Electron J Stat 10 1-59
[2]  
Ren Z(2017)Loading and plotting of cortical surface representations in nilearn Res Ideas Outcomes 3 e12342-717
[3]  
Zhou HH(2018)Dbsda: Lowering the bound of misclassification rate for sparse linear discriminant analysis via model debiasing IEEE Trans Neural Netw Learn Syst 30 707-26
[4]  
Huntenburg J(2005)Mining data streams: a review ACM Sigmod Record 34 18-604
[5]  
Abraham A(2006)10 challenging problems in data mining research Int J Inform Technol Dec Making 5 597-395
[6]  
Loula J(2017)Daehr: A discriminant analysis framework for electronic health record data and an application to early detection of mental health disorders ACM Trans Int Syst Technol (TIST) 8 47-441
[7]  
Liem F(2021)Improving covariance-regularized discriminant analysis for ehr-based predictive analytics of diseases Appl Intell 51 377-980
[8]  
Dadi K(2016)Sparse regression models for unraveling group and individual associations in eqtl mapping BMC bioinformatics 17 136-607
[9]  
Varoquaux G(2016)Cgc: A flexible and robust approach to integrating co-regularized multi-domain graph for clustering ACM Trans Know Discov Data (TKDD) 10 46-1378
[10]  
Xiong H(2008)Sparse inverse covariance estimation with the graphical lasso Biostatistics 9 432-1491