Gene Set Analysis Using Spatial Statistics

被引:0
作者
Riffo-Campos, Angela L. [1 ,2 ]
Ayala, Guillermo [2 ]
Montes, Francisco [2 ]
机构
[1] Univ La Frontera, Ctr Excelencia Modelac & Comp Cient, Temuco 4780000, Chile
[2] Univ Valencia, Dept Estadist & Invest Operat, Avda Vicent Andres Estelles 1, Burjassot 46100, Spain
关键词
colorectal cancer; RNA-Seq; paired samples; spatial point pattern; ENRICHMENT ANALYSIS; EXPRESSION; POINT; RISK;
D O I
10.3390/math9050521
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Gene differential expression consists of the study of the possible association between the gene expression, evaluated using different types of data as DNA microarray or RNA-Seq technologies, and the phenotype. This can be performed marginally for each gene (differential gene expression) or using a gene set collection (gene set analysis). A previous (marginal) per-gene analysis of differential expression is usually performed in order to obtain a set of significant genes or marginal p-values used later in the study of association between phenotype and gene expression. This paper proposes the use of methods of spatial statistics for testing gene set differential expression analysis using paired samples of RNA-Seq counts. This approach is not based on a previous per-gene differential expression analysis. Instead, we compare the paired counts within each sample/control using a binomial test. Each pair per gene will produce a p-value so gene expression profile is transformed into a vector of p-values which will be considered as an event belonging to a point pattern. This would be the first component of a bivariate point pattern. The second component is generated by applying two different randomization distributions to the correspondence between samples and treatment. The self-contained null hypothesis considered in gene set analysis can be formulated in terms of the associated point pattern as a random labeling of the considered bivariate point pattern. The gene sets were defined by the Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The proposed methodology was tested in four RNA-Seq datasets of colorectal cancer (CRC) patients and the results were contrasted with those obtained using the edgeR-GOseq pipeline. The proposed methodology has proved to be consistent at the biological and statistical level, in particular using Cuzick and Edwards test with one realization of the second component and between-pair distribution.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 35 条
[1]   A general modular framework for gene set enrichment analysis [J].
Ackermann, Marit ;
Strimmer, Korbinian .
BMC BIOINFORMATICS, 2009, 10
[2]  
[Anonymous], 2013, STAT ANAL SPATIAL SP, DOI DOI 10.1201/B15326
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]  
BARTLETT MS, 1963, J ROY STAT SOC B, V25, P264
[5]   Intracrine VEGF signalling mediates colorectal cancer cell migration and invasion [J].
Bhattacharya, Rajat ;
Fan, Fan ;
Wang, Rui ;
Ye, Xiangcang ;
Xia, Ling ;
Boulbes, Delphine ;
Ellis, Lee M. .
BRITISH JOURNAL OF CANCER, 2017, 117 (06) :848-855
[6]  
Chen Yunshun, 2016, F1000Res, V5, P1438, DOI 10.12688/f1000research.8987.2
[7]  
Chiu S.N., 2013, Stochastic Geometry and its Applications
[8]   A survey of best practices for RNA-seq data analysis [J].
Conesa, Ana ;
Madrigal, Pedro ;
Tarazona, Sonia ;
Gomez-Cabrero, David ;
Cervera, Alejandra ;
McPherson, Andrew ;
Szczesniak, Michal Wojciech ;
Gaffney, Daniel J. ;
Elo, Laura L. ;
Zhang, Xuegong ;
Mortazavi, Ali .
GENOME BIOLOGY, 2016, 17
[9]  
CUZICK J, 1990, J ROY STAT SOC B MET, V52, P73
[10]   The statistical properties of gene-set analysis [J].
de Leeuw, Christiaan A. ;
Neale, Benjamin M. ;
Heskes, Tom ;
Posthuma, Danielle .
NATURE REVIEWS GENETICS, 2016, 17 (06) :353-364