Selection-adjusted inference: an application to confidence intervals for cis-eQTL effect sizes

被引:7
作者
Panigrahi, Snigdha [1 ]
Zhu, Junjie [2 ]
Sabatti, Chiara [3 ,4 ]
机构
[1] Univ Michigan, Dept Stat, 451 West Hall,1085 South Univ, Ann Arbor, MI 48109 USA
[2] Stanford Univ, Dept Elect Engn, 350 Serra Mall, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Biomed Data Sci, 390 Serra Mall, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Stat, 390 Serra Mall, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
Conditional inference; Confidence intervals; Effect size estimation; eQTL; Randomization; Selection bias; Winner's curse; FALSE DISCOVERY RATE;
D O I
10.1093/biostatistics/kxz024
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The goal of expression quantitative trait loci (eQTL) studies is to identify the genetic variants that influence the expression levels of the genes in an organism. High throughput technology has made such studies possible: in a given tissue sample, it enables us to quantify the expression levels of approximately 20 000 genes and to record the alleles present at millions of genetic polymorphisms. While obtaining this data is relatively cheap once a specimen is at hand, obtaining human tissue remains a costly endeavor: eQTL studies continue to be based on relatively small sample sizes, with this limitation particularly serious for tissues as brain, liver, etc.-often the organs of most immediate medical relevance. Given the high-dimensional nature of these datasets and the large number of hypotheses tested, the scientific community has adopted early on multiplicity adjustment procedures. These testing procedures primarily control the false discoveries rate for the identification of genetic variants with influence on the expression levels. In contrast, a problem that has not received much attention to date is that of providing estimates of the effect sizes associated with these variants, in a way that accounts for the considerable amount of selection. Yet, given the difficulty of procuring additional samples, this challenge is of practical importance. We illustrate in this work how the recently developed conditional inference approach can be deployed to obtain confidence intervals for the eQTL effect sizes with reliable coverage. The procedure we propose is based on a randomized hierarchical strategy with a 2-fold contribution: (1) it reflects the selection steps typically adopted in state of the art investigations and (2) it introduces the use of randomness instead of data-splitting to maximize the use of available data. Analysis of the GTEx Liver dataset (v6) suggests that naively obtained confidence intervals would likely not cover the true values of effect sizes and that the number of local genetic polymorphisms influencing the expression level of genes might be underestimated.
引用
收藏
页码:181 / 197
页数:17
相关论文
共 18 条
[1]  
[Anonymous], 2009, Advances in neural information processing systems
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   VALID POST-SELECTION INFERENCE [J].
Berk, Richard ;
Brown, Lawrence ;
Buja, Andreas ;
Zhang, Kai ;
Zhao, Linda .
ANNALS OF STATISTICS, 2013, 41 (02) :802-837
[4]   Preserving Statistical Validity in Adaptive Data Analysis [J].
Dwork, Cynthia ;
Feldman, Vitaly ;
Hardt, Moritz ;
Pitassi, Toniann ;
Reingold, Omer ;
Roth, Aaron .
STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, :117-126
[5]   'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns [J].
Trevor Hastie ;
Robert Tibshirani ;
Michael B Eisen ;
Ash Alizadeh ;
Ronald Levy ;
Louis Staudt ;
Wing C Chan ;
David Botstein ;
Patrick Brown .
Genome Biology, 1 (2)
[6]   EXACT POST-SELECTION INFERENCE, WITH APPLICATION TO THE LASSO [J].
Lee, Jason D. ;
Sun, Dennis L. ;
Sun, Yuekai ;
Taylor, Jonathan E. .
ANNALS OF STATISTICS, 2016, 44 (03) :907-927
[7]   The Genotype-Tissue Expression (GTEx) project [J].
Lonsdale, John ;
Thomas, Jeffrey ;
Salvatore, Mike ;
Phillips, Rebecca ;
Lo, Edmund ;
Shad, Saboor ;
Hasz, Richard ;
Walters, Gary ;
Garcia, Fernando ;
Young, Nancy ;
Foster, Barbara ;
Moser, Mike ;
Karasik, Ellen ;
Gillard, Bryan ;
Ramsey, Kimberley ;
Sullivan, Susan ;
Bridge, Jason ;
Magazine, Harold ;
Syron, John ;
Fleming, Johnelle ;
Siminoff, Laura ;
Traino, Heather ;
Mosavel, Maghboeba ;
Barker, Laura ;
Jewell, Scott ;
Rohrer, Dan ;
Maxim, Dan ;
Filkins, Dana ;
Harbach, Philip ;
Cortadillo, Eddie ;
Berghuis, Bree ;
Turner, Lisa ;
Hudson, Eric ;
Feenstra, Kristin ;
Sobin, Leslie ;
Robb, James ;
Branton, Phillip ;
Korzeniewski, Greg ;
Shive, Charles ;
Tabor, David ;
Qi, Liqun ;
Groch, Kevin ;
Nampally, Sreenath ;
Buia, Steve ;
Zimmerman, Angela ;
Smith, Anna ;
Burges, Robin ;
Robinson, Karna ;
Valentino, Kim ;
Bradbury, Deborah .
NATURE GENETICS, 2013, 45 (06) :580-585
[8]   Fast and efficient QTL mapper for thousands of molecular phenotypes [J].
Ongen, Halit ;
Buil, Alfonso ;
Brown, Andrew Anand ;
Dermitzakis, Emmanouil T. ;
Delaneau, Olivier .
BIOINFORMATICS, 2016, 32 (10) :1479-1485
[9]  
PANIGRAHI S., 2016, ARXIV PREPRINT ARXIV
[10]  
PANIGRAHI S., 2017, ARXIV PREPRINT ARXIV