A Two-Sample Test for Equality of Means in High Dimension

被引:76
作者
Gregory, Karl Bruce [1 ]
Carroll, Raymond J. [1 ]
Baladandayuthapani, Veerabhadran [2 ]
Lahiri, Soumendra N. [3 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[2] Univ Texas MD Anderson Canc Ctr, Dept Biostat, Houston, TX 77230 USA
[3] N Carolina State Univ, Dept Stat, Raleigh, NC 27695 USA
基金
美国国家科学基金会;
关键词
Copy number variation; Heteroscedasticity; Large p; FALSE DISCOVERY RATE; SEGMENTATION;
D O I
10.1080/01621459.2014.934826
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We develop a test statistic for testing the equality of two population mean vectors in the "large-p-small-n" setting. Such a test must surmount the rank-deficiency of the sample covariance matrix, which breaks down the classic Hotel ling T-2 test. The proposed procedure, called the generalized component test, avoids full estimation of the covariance matrix by assuming that the p components admit a logical ordering such that the dependence between components is related to their displacement. The test is shown to be competitive with other recently developed methods under ARMA and long-range dependence structures and to achieve superior power for heavy-tailed data. The test does not assume equality of covariance matrices between the two populations, is robust to heteroscedasticity in the component variances, and requires very little computation time, which allows its use in settings with very large p. An analysis of mitochondrial calcium concentration in mouse cardiac muscles over time and of copy number variations in a glioblastoma multiforme dataset from The Cancer Genome Atlas are carried out to illustrate the test. Supplementary materials for this article are available online.
引用
收藏
页码:837 / 849
页数:13
相关论文
共 24 条
[1]  
Anderson O.D., 1977, Time Series Analysis and Forecasting: The Box-Jenkins Approach
[2]  
Bai ZD, 1996, STAT SINICA, V6, P311
[3]   Bayesian Random Segmentation Models to Identify Shared Copy Number Aberrations for Array CGH Data [J].
Baladandayuthapani, Veerabhadran ;
Ji, Yuan ;
Talluri, Rajesh ;
Nieto-Barajas, Luis E. ;
Morris, Jeffrey S. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (492) :1358-1375
[4]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]  
Brillinger D. R., 1981, Time Series: Data Analysis and Theory
[7]  
Brockwell Peter J, 2009, Time Series: Theory and Methods
[8]   Two-sample test of high dimensional means under dependence [J].
Cai, T. Tony ;
Liu, Weidong ;
Xia, Yin .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2014, 76 (02) :349-372
[9]   Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings [J].
Cai, Tony ;
Liu, Weidong ;
Xia, Yin .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (501) :265-277
[10]   A Constrained l1 Minimization Approach to Sparse Precision Matrix Estimation [J].
Cai, Tony ;
Liu, Weidong ;
Luo, Xi .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (494) :594-607