The functional false discovery rate with applications to genomics

被引:25
作者
Chen, Xiongzhi [1 ,2 ]
Robinson, David G. [1 ,3 ]
Storey, John D. [1 ]
机构
[1] Princeton Univ, Lewis Sigler Inst Integrat Genom, Princeton, NJ 08544 USA
[2] Washington State Univ, Dept Math & Stat, Pullman, WA 99164 USA
[3] DataCamp, Empire State Bldg,350 5th Ave,Floor 77, New York, NY 10118 USA
基金
美国国家卫生研究院;
关键词
eQTL; FDR; Functional data analysis; Genetics of gene expression; Kernel density estimation; Local false discovery rate; Multiple hypothesis testing; q-value; RNA-seq; Sequencing depth; EXPRESSION; POWER;
D O I
10.1093/biostatistics/kxz010
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The false discovery rate (FDR) measures the proportion of false discoveries among a set of hypothesis tests called significant. This quantity is typically estimated based on p-values or test statistics. In some scenarios, there is additional information available that may be used to more accurately estimate the FDR. We develop a new framework for formulating and estimating FDRs and q-values when an additional piece of information, which we call an "informative variable", is available. For a given test, the informative variable provides information about the prior probability a null hypothesis is true or the power of that particular test. The FDR is then treated as a function of this informative variable. We consider two applications in genomics. Our first application is a genetics of gene expression (eQTL) experiment in yeast where every genetic marker and gene expression trait pair are tested for associations. The informative variable in this case is the distance between each genetic marker and gene. Our second application is to detect differentially expressed genes in an RNA-seq study carried out in mice. The informative variable in this study is the per-gene read depth. The framework we develop is quite general, and it should be useful in a broad range of scientific applications.
引用
收藏
页码:68 / 81
页数:14
相关论文
共 37 条
  • [1] [Anonymous], 1990, CBMS NSF REGIONAL C, DOI DOI 10.1137/1.9781611970128
  • [2] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [3] False discovery rates for spatial signals
    Benjamini, Ybav
    Heller, Ruth
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (480) : 1272 - 1281
  • [4] A direct approach to estimating false discovery rates conditional on covariates
    Boca, Simina M.
    Leek, Jeffrey T.
    [J]. PEERJ, 2018, 6
  • [5] Evaluating Gene Expression in C57BL/6J and DBA/2J Mouse Striatum Using RNA-Seq and Microarrays
    Bottomly, Daniel
    Walter, Nicole A. R.
    Hunter, Jessica Ezzell
    Darakjian, Priscila
    Kawane, Sunita
    Buck, Kari J.
    Searles, Robert P.
    Mooney, Michael
    McWeeney, Shannon K.
    Hitzemann, Robert
    [J]. PLOS ONE, 2011, 6 (03):
  • [6] Genetic dissection of transcriptional regulation in budding yeast
    Brem, RB
    Yvert, G
    Clinton, R
    Kruglyak, L
    [J]. SCIENCE, 2002, 296 (5568) : 752 - 755
  • [7] Accuracy of RNA-Seq and its dependence on sequencing depth
    Cai, Guoshuai
    Li, Hua
    Lu, Yue
    Huang, Xuelin
    Lee, Juhee
    Mueller, Peter
    Ji, Yuan
    Liang, Shoudan
    [J]. BMC BIOINFORMATICS, 2012, 13 : S5
  • [8] Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks
    Cai, T. Tony
    Sun, Wenguang
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2009, 104 (488) : 1467 - 1481
  • [9] False discovery rate revisited: FDR and topological inference using Gaussian random fields
    Chumbley, Justin R.
    Friston, Karl J.
    [J]. NEUROIMAGE, 2009, 44 (01) : 62 - 70
  • [10] SMOOTHING NOISY DATA WITH SPLINE FUNCTIONS
    WAHBA, G
    [J]. NUMERISCHE MATHEMATIK, 1975, 24 (05) : 383 - 393