Gene set analysis controlling for length bias in RNA-seq experiments

被引:6
|
作者
Ren, Xing [1 ]
Hu, Qiang [2 ]
Liu, Song [2 ]
Wang, Jianmin [2 ]
Miecznikowski, Jeffrey C. [1 ]
机构
[1] SUNY Univ Buffalo, Dept Biostat, Buffalo, NY 14214 USA
[2] Roswell Pk Canc Inst, Dept Biostat & Bioinformat, Buffalo, NY 14263 USA
来源
BIODATA MINING | 2017年 / 10卷
关键词
RNA-seq; Gene set analysis; Gene length bias; Maxmean statistic; Restandardization; DIFFERENTIAL EXPRESSION ANALYSIS; FALSE DISCOVERY RATE; BIOCONDUCTOR PACKAGE; ENRICHMENT ANALYSIS; PROSTATE-CANCER; MICROARRAY; POWERFUL; VARIABILITY; ONTOLOGY; STEROIDS;
D O I
10.1186/s13040-017-0125-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: In gene set analysis, the researchers are interested in determining the gene sets that are significantly correlated with an outcome, e.g. disease status or treatment. With the rapid development of high throughput sequencing technologies, Ribonucleic acid sequencing (RNA-seq) has become an important alternative to traditional expression arrays in gene expression studies. Challenges exist in adopting the existent algorithms to RNA-seq data given the intrinsic difference of the technologies and data. In RNA-seq experiments, the measure of gene expression is correlated with gene length. This inherent correlation may cause bias in gene set analysis. Results: We develop SeqGSA, a new method for gene set analysis with length bias adjustment for RNA-seq data. It extends from the R package GSA designed for microarrays. Our method compares the gene set maxmean statistic against permutations, while also taking into account of the statistics of the other gene sets. To adjust for the gene length bias, we implement a flexible weighted sampling scheme in the restandardization step of our algorithm. We show our method improves the power of identifying significant gene sets that are affected by the length bias. We also show that our method maintains the type I error comparing with another representative method for gene set enrichment test. Conclusions: SeqGSA is a promising tool for testing significant gene pathways with RNA-seq data while adjusting for inherent gene length effect. It enhances the power to detect gene sets affected by the bias and maintains type I error under various situations.
引用
收藏
页码:1 / 18
页数:18
相关论文
共 50 条
  • [1] Gene set analysis controlling for length bias in RNA-seq experiments
    Xing Ren
    Qiang Hu
    Song Liu
    Jianmin Wang
    Jeffrey C. Miecznikowski
    BioData Mining, 10
  • [2] Length bias correction for RNA-seq data in gene set analyses
    Gao, Liyan
    Fang, Zhide
    Zhang, Kui
    Zhi, Degui
    Cui, Xiangqin
    BIOINFORMATICS, 2011, 27 (05) : 662 - 669
  • [3] Controlling the false-discovery rate by procedures adapted to the length bias of RNA-Seq
    Tae Young Yang
    Seongmun Jeong
    Journal of the Korean Statistical Society, 2018, 47 : 13 - 23
  • [4] Controlling the false-discovery rate by procedures adapted to the length bias of RNA-Seq
    Yang, Tae Young
    Jeong, Seongmun
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2018, 47 (01) : 13 - 23
  • [5] Gene ontology analysis for RNA-seq: accounting for selection bias
    Matthew D Young
    Matthew J Wakefield
    Gordon K Smyth
    Alicia Oshlack
    Genome Biology, 11
  • [6] Gene ontology analysis for RNA-seq: accounting for selection bias
    Young, Matthew D.
    Wakefield, Matthew J.
    Smyth, Gordon K.
    Oshlack, Alicia
    GENOME BIOLOGY, 2010, 11 (02):
  • [7] GSVA: gene set variation analysis for microarray and RNA-Seq data
    Haenzelmann, Sonja
    Castelo, Robert
    Guinney, Justin
    BMC BIOINFORMATICS, 2013, 14
  • [8] GSVA: gene set variation analysis for microarray and RNA-Seq data
    Sonja Hänzelmann
    Robert Castelo
    Justin Guinney
    BMC Bioinformatics, 14
  • [9] Comparative evaluation of gene set analysis approaches for RNA-Seq data
    Rahmatallah, Yasir
    Emmert-Streib, Frank
    Glazko, Galina
    BMC BIOINFORMATICS, 2014, 15
  • [10] GSAASeqSP: A Toolset for Gene Set Association Analysis of RNA-Seq Data
    Qing Xiong
    Sayan Mukherjee
    Terrence S. Furey
    Scientific Reports, 4