Impact of human gene annotations on RNA-seq differential expression analysis

被引:6
|
作者
Hamaguchi, Yu [1 ]
Zeng, Chao [1 ,2 ]
Hamada, Michiaki [1 ,2 ,3 ,4 ]
机构
[1] Waseda Univ, Fac Sci & Engn, Shinjuku Ku, 55N-06-10,3-4-1 Okubo, Tokyo 1698555, Japan
[2] Waseda Univ, AIST, Computat Bio Big Data Open Innovat Lab CBBD OIL, Shinjuku Ku, 3-4-1 Okubo, Tokyo 1698555, Japan
[3] Waseda Univ, Inst Med Oriented Struct Biol, Shinjuku Ku, 2-2 Wakamatsu Cho, Tokyo 1628480, Japan
[4] Nippon Med Sch, Grad Sch Med, Bunkyo Ku, 1-1-5 Sendagi, Tokyo 1138602, Japan
关键词
RNA-seq; Differential expression analysis; Benchmarking; Gene annotation; QUANTIFICATION; TRANSCRIPTOME; DISCOVERY; ALIGNMENT; HISAT;
D O I
10.1186/s12864-021-08038-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated-a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear. Results Using "mappability", a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically. Conclusions We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] A scaling normalization method for differential expression analysis of RNA-seq data
    Robinson, Mark D.
    Oshlack, Alicia
    GENOME BIOLOGY, 2010, 11 (03):
  • [42] LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data
    Bingqing Lin
    Li-Feng Zhang
    Xin Chen
    BMC Genomics, 15
  • [43] A scaling normalization method for differential expression analysis of RNA-seq data
    Mark D Robinson
    Alicia Oshlack
    Genome Biology, 11
  • [44] Differential Expression Analysis in RNA-seq Data Using a Geometric Approach
    Tambonis, Tiago
    Boareto, Marcelo
    Leite, Vitor B. P.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2018, 25 (11) : 1257 - 1265
  • [45] Effect of method of deduplication on estimation of differential gene expression using RNA-seq
    klepikova, Anna V.
    Kasianov, Artem S.
    Chesnokov, Mikhail S.
    Lazarevich, Natalia L.
    Penin, Aleksey A.
    Logacheva, Maria
    PEERJ, 2017, 5
  • [46] Power analysis and sample size estimation for RNA-Seq differential expression
    Ching, Travers
    Huang, Sijia
    Garmire, Lana X.
    RNA, 2014, 20 (11) : 1684 - 1696
  • [47] GENE-Counter: A Computational Pipeline for the Analysis of RNA-Seq Data for Gene Expression Differences
    Cumbie, Jason S.
    Kimbrel, Jeffrey A.
    Di, Yanming
    Schafer, Daniel W.
    Wilhelm, Larry J.
    Fox, Samuel E.
    Sullivan, Christopher M.
    Curzon, Aron D.
    Carrington, James C.
    Mockler, Todd C.
    Chang, Jeff H.
    PLOS ONE, 2011, 6 (10):
  • [48] Impact of gene annotation choice on the quantification of RNA-seq data
    Chisanga, David
    Liao, Yang
    Shi, Wei
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [49] Impact of gene annotation choice on the quantification of RNA-seq data
    David Chisanga
    Yang Liao
    Wei Shi
    BMC Bioinformatics, 23
  • [50] A fuzzy method for RNA-Seq differential expression analysis in presence of multireads
    Arianna Consiglio
    Corrado Mencar
    Giorgio Grillo
    Flaviana Marzano
    Mariano Francesco Caratozzolo
    Sabino Liuni
    BMC Bioinformatics, 17