A new robust covariance matrix estimation for high-dimensional microbiome data

被引:0
作者
Wang, Jiyang [1 ,2 ]
Liang, Wanfeng [3 ]
Li, Lijie [1 ]
Wu, Yue [1 ]
Ma, Xiaoyan [4 ]
机构
[1] Nankai Univ, Sch Stat & Data Sci, Tianjin 300071, Peoples R China
[2] Xinjiang Univ, Coll Math & Syst Sci, Urumqi 830046, Xinjiang, Peoples R China
[3] Dongbei Univ Finance & Econ, Sch Data Sci & Artificial Intelligence, Dalian 116025, Liaoning, Peoples R China
[4] Ningxia Univ, Sch Math & Stat, Yinchuan 750021, Ningxia, Peoples R China
基金
中国国家自然科学基金;
关键词
centred log-ratio; covariance matrix; high dimension; microbiome data; robustness; thresholding; OPTIMAL RATES; COMPOSITIONAL DATA; GUT MICROBIOME; CONVERGENCE; PATTERNS; OBESITY;
D O I
10.1111/anzs.12415
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Microbiome data typically lie in a high-dimensional simplex. One of the key questions in metagenomic analysis is to exploit the covariance structure for this kind of data. In this paper, a framework called approximate-estimate-threshold (AET) is developed for the robust basis covariance estimation for high-dimensional microbiome data. To be specific, we first construct a proxy matrix Gamma$$ \boldsymbol{\Gamma} $$, which is almost indistinguishable from the real basis covariance matrix & sum;$$ \boldsymbol{\Sigma} $$. Then, any estimator Gamma<^>$$ \hat{\boldsymbol{\Gamma}} $$ satisfying some conditions can be used to estimate Gamma$$ \boldsymbol{\Gamma} $$. Finally, we impose a thresholding step on Gamma<^>$$ \hat{\boldsymbol{\Gamma}} $$ to obtain the final estimator & sum;<^>$$ \hat{\boldsymbol{\Sigma}} $$. In particular, this paper applies a Huber-type estimator Gamma<^>$$ \hat{\boldsymbol{\Gamma}} $$, and achieves robustness by only requiring the boundedness of 2+& varepsilon;$$ \epsilon $$ moments for some & varepsilon;is an element of(0,2]$$ \epsilon \in \left(0,2\right] $$. We derive the convergence rate of & sum;<^>$$ \hat{\boldsymbol{\Sigma}} $$ under the spectral norm, and provide theoretical guarantees on support recovery. Extensive simulations and a real example are used to illustrate the empirical performance of our method.
引用
收藏
页码:281 / 295
页数:15
相关论文
共 30 条
[1]  
AITCHISON J, 1982, J ROY STAT SOC B, V44, P139
[2]  
Aitchison J., 2003, The statistical analysis of compositional data
[3]   Robust estimation of high-dimensional covariance and precision matrices [J].
Avella-Medina, Marco ;
Battey, Heather S. ;
Fan, Jianqing ;
Li, Quefeng .
BIOMETRIKA, 2018, 105 (02) :271-284
[4]   Investigating microbial co-occurrence patterns based on metagenomic compositional data [J].
Ban, Yuguang ;
An, Lingling ;
Jiang, Hongmei .
BIOINFORMATICS, 2015, 31 (20) :3322-3329
[5]   Minimax and Adaptive Estimation of Covariance Operator for Random Variables Observed on a Lattice Graph [J].
Cai, T. Tony ;
Yuan, Ming .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (513) :253-265
[6]  
Cai TT, 2016, ELECTRON J STAT, V10, P1, DOI 10.1214/15-EJS1081
[7]   ESTIMATING SPARSE PRECISION MATRIX: OPTIMAL RATES OF CONVERGENCE AND ADAPTIVE ESTIMATION [J].
Cai, T. Tony ;
Liu, Weidong ;
Zhou, Harrison H. .
ANNALS OF STATISTICS, 2016, 44 (02) :455-488
[8]   Optimal rates of convergence for estimating Toeplitz covariance matrices [J].
Cai, T. Tony ;
Ren, Zhao ;
Zhou, Harrison H. .
PROBABILITY THEORY AND RELATED FIELDS, 2013, 156 (1-2) :101-143
[9]   OPTIMAL RATES OF CONVERGENCE FOR COVARIANCE MATRIX ESTIMATION [J].
Cai, T. Tony ;
Zhang, Cun-Hui ;
Zhou, Harrison H. .
ANNALS OF STATISTICS, 2010, 38 (04) :2118-2144
[10]   Adaptive Thresholding for Sparse Covariance Matrix Estimation [J].
Cai, Tony ;
Liu, Weidong .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (494) :672-684