A TWO-SAMPLE TEST FOR HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO GENE-SET TESTING

被引:428
作者
Chen, Song Xi [1 ,2 ]
Qin, Ying-Li [1 ]
机构
[1] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
[2] Peking Univ, Guanghua Sch Management, Beijing 100871, Peoples R China
关键词
High dimension; gene-set testing; large p small n; martingale central limit theorem; multiple comparison; FALSE DISCOVERY RATE; MICROARRAY DATA; COVARIANCE-MATRIX; HYPOTHESIS TESTS; NORMALIZATION; CONSISTENCY; CATEGORIES; EXPRESSION; LIMIT; MODEL;
D O I
10.1214/09-AOS716
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a two-sample test for the means of high-dimensional data when the data dimension is much larger than the sample size. Hotelling's classical T(2) test does not work for this "large p, small n" situation. The proposed test does not require explicit conditions in the relationship between the data dimension and sample size. This offers much flexibility in analyzing high-dimensional data. An application of the proposed test is in testing significance for sets of genes which we demonstrate in an empirical study on a leukemia data set.
引用
收藏
页码:808 / 835
页数:28
相关论文
共 25 条
  • [1] Adapting to unknown sparsity by controlling the false discovery rate
    Abramovich, Felix
    Benjamini, Yoav
    Donoho, David L.
    Johnstone, Iain M.
    [J]. ANNALS OF STATISTICS, 2006, 34 (02) : 584 - 653
  • [2] [Anonymous], 2005, BIOINFORMATICS COMPU
  • [3] [Anonymous], 2003, Introduction to Nessus
  • [4] Bai ZD, 1996, STAT SINICA, V6, P311
  • [5] Significance analysis of functional categories in gene expression studies: a structured permutation approach
    Barry, WT
    Nobel, AB
    Wright, FA
    [J]. BIOINFORMATICS, 2005, 21 (09) : 1943 - 1949
  • [6] Benjamini Y, 2001, ANN STAT, V29, P1165
  • [7] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [8] CHEN S.X., 2008, A two Sample Test for High Dimensional Data with Applications to Gene-Set Testing
  • [9] Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival
    Chiaretti, S
    Li, XC
    Gentleman, R
    Vitale, A
    Vignetti, M
    Mandelli, F
    Ritz, J
    Foa, R
    [J]. BLOOD, 2004, 103 (07) : 2771 - 2778
  • [10] Dudoit S., 2008, I MATH STAT COLLECTI, V2, P153, DOI DOI 10.1214/193940307000000446