Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data

被引:16
作者
Heller, Ruth [1 ,5 ]
Chatterjee, Nilanjan [2 ,3 ]
Krieger, Abba [4 ]
Shi, Jianxin [5 ]
机构
[1] Tel Aviv Univ, Dept Stat & Operat Res, IL-9667801 Tel Aviv, Israel
[2] Johns Hopkins Univ, Dept Biostat, Bloomberg Sch Publ Hlth, Baltimore, MD 21205 USA
[3] Johns Hopkins Univ, Sch Med, Dept Oncol, Baltimore, MD 21205 USA
[4] Univ Penn, Dept Stat, Philadelphia, PA USA
[5] NCI, Biostat Branch, Div Canc Epidemiol & Genet, Rockville, MD USA
基金
以色列科学基金会;
关键词
Conditional p-value; False discovery rate; Multiple testing; Selective inference; FALSE DISCOVERY RATE;
D O I
10.1080/01621459.2017.1375933
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In many genomic applications, hypotheses tests are performed for powerful identification of signals by aggregating test-statistics across units within naturally defined classes. Following class-level testing, it is naturally of interest to identify the lower level units which contain true signals. Testing the individual units within a class without taking into account the fact that the class was selected using an aggregate-level test-statistic, will produce biased inference. We develop a hypothesis testing framework that guarantees control for false positive rates conditional on the fact that the class was selected. Specifically, we develop procedures for calculating unit level p-values that allows rejection of null hypotheses controlling for two types of conditional error rates, one relating to family-wise rate and the other relating to false discovery rate. We use simulation studies to illustrate validity and power of the proposed procedure in comparison to several possible alternatives. We illustrate the power of the method in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project. Supplementary materials for this article are available online.
引用
收藏
页码:1770 / 1783
页数:14
相关论文
共 29 条
[1]  
Barber R. F., 2015, J ROYAL STAT SOC B, V79, P1247
[2]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   False discovery rates for spatial signals [J].
Benjamini, Ybav ;
Heller, Ruth .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (480) :1272-1281
[5]   Adaptive linear step-up procedures that control the false discovery rate [J].
Benjamini, Yoav ;
Krieger, Abba M. ;
Yekutieli, Daniel .
BIOMETRIKA, 2006, 93 (03) :491-507
[6]   Selective inference on multiple families of hypotheses [J].
Benjamini, Yoav ;
Bogomolov, Marina .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2014, 76 (01) :297-318
[7]   A Subset-Based Approach Improves Power and Interpretation for the Combined Analysis of Genetic Association Studies of Heterogeneous Traits [J].
Bhattacharjee, Samsiddhi ;
Rajaraman, Preetha ;
Jacobs, Kevin B. ;
Wheeler, William A. ;
Melin, Beatrice S. ;
Hartge, Patricia ;
Yeager, Meredith ;
Chung, Charles C. ;
Chanock, Stephen J. ;
Chatterjee, Nilanjan .
AMERICAN JOURNAL OF HUMAN GENETICS, 2012, 90 (05) :821-835
[8]  
Blanchard G, 2009, J MACH LEARN RES, V10, P2837
[9]  
Fithian W., 2015, ARXIV14102597
[10]   Allergy associations with the adult fecal microbiota: Analysis of the American Gut Project [J].
Hua, Xing ;
Goedert, James J. ;
Pu, Angela ;
Yu, Guoqin ;
Shi, Jianxin .
EBIOMEDICINE, 2016, 3 :172-179