Monte Carlo randomization tests for large-scale abundance datasets on the GPU

被引:8
作者
Van Hemert, John L. [1 ,2 ,3 ]
Dickerson, Julie A. [1 ,2 ,3 ]
机构
[1] Iowa State Univ, Bioinformat & Computat Biol Program, Ames, IA 50011 USA
[2] Iowa State Univ, Elect & Comp Engn Dept, Ames, IA 50011 USA
[3] Iowa State Univ, Virtual Real Applicat Ctr, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
GPGPU; GPU; Parallelization; Microarray; Next-Generation sequencing; Abundance data; Metabolomics; Non-parametric test; Monte Carlo simulation;
D O I
10.1016/j.cmpb.2010.04.010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Statistical tests are often performed to discover which experimental variables are reacting to specific treatments. Time-series statistical models usually require the researcher to make assumptions with respect to the distribution of measured responses which may not hold. Randomization tests can be applied to data in order to generate null distributions non-parametrically. However, large numbers of randomizations are required for the precise p-values needed to control false discovery rates. When testing tens of thousands of variables (genes, chemical compounds, or otherwise), significant q-value cutoffs can be extremely small (on the order of 10(-5) to 10(-8)). This requires high-precision p-values, which in turn require large numbers of randomizations. The NVIDIA (R) Compute Unified Device Architecture (R) (CUDA (R)) platform for General Programming on the Graphics Processing Unit (GPGPU) was used to implement an application which performs high-precision randomization tests via Monte Carlo sampling for quickly screening custom test statistics for experiments with large numbers of variables, such as microarrays, Next-Generation sequencing read counts, chromatographical signals, or other abundance measurements. The software has been shown to achieve up to more than 12 fold speedup on a Graphics Processing Unit (GPU) when compared to a powerful Central Processing Unit (CPU). The main limitation is concurrent random access of shared memory on the GPU. The software is available from the authors. (C) 2011 Elsevier Ireland Ltd. All rights reserved.
引用
收藏
页码:80 / 86
页数:7
相关论文
共 11 条
[1]  
[Anonymous], 2005, R LANG ENV STAT COMP
[2]   ALGORITHM-235 - RANDOM PERMUTATION [G6] [J].
DURSTENFELD, R .
COMMUNICATIONS OF THE ACM, 1964, 7 (07) :420-420
[3]  
HOLM S, 1979, SCAND J STAT, V6, P65
[4]   Exploration, normalization, and summaries of high density oligonucleotide array probe level data [J].
Irizarry, RA ;
Hobbs, B ;
Collin, F ;
Beazer-Barclay, YD ;
Antonellis, KJ ;
Scherf, U ;
Speed, TP .
BIOSTATISTICS, 2003, 4 (02) :249-264
[5]   Fewer permutations, more accurate P-values [J].
Knijnenburg, Theo A. ;
Wessels, Lodewyk F. A. ;
Reinders, Marcel J. T. ;
Shmulevich, Ilya .
BIOINFORMATICS, 2009, 25 (12) :I161-I168
[6]  
*NVIDIA CORP, 2007, CUDA NVIDIA CUDA PRO
[7]  
Podlozhnyuk Victor., 2007, PARALLEL MERSENNE TW
[8]   The positive false discovery rate:: A Bayesian interpretation and the q-value [J].
Storey, JD .
ANNALS OF STATISTICS, 2003, 31 (06) :2013-2035
[9]   Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach [J].
Storey, JD ;
Taylor, JE ;
Siegmund, D .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2004, 66 :187-205
[10]   Statistical significance for genomewide studies [J].
Storey, JD ;
Tibshirani, R .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (16) :9440-9445