Impact of human gene annotations on RNA-seq differential expression analysis

被引:6
|
作者
Hamaguchi, Yu [1 ]
Zeng, Chao [1 ,2 ]
Hamada, Michiaki [1 ,2 ,3 ,4 ]
机构
[1] Waseda Univ, Fac Sci & Engn, Shinjuku Ku, 55N-06-10,3-4-1 Okubo, Tokyo 1698555, Japan
[2] Waseda Univ, AIST, Computat Bio Big Data Open Innovat Lab CBBD OIL, Shinjuku Ku, 3-4-1 Okubo, Tokyo 1698555, Japan
[3] Waseda Univ, Inst Med Oriented Struct Biol, Shinjuku Ku, 2-2 Wakamatsu Cho, Tokyo 1628480, Japan
[4] Nippon Med Sch, Grad Sch Med, Bunkyo Ku, 1-1-5 Sendagi, Tokyo 1138602, Japan
关键词
RNA-seq; Differential expression analysis; Benchmarking; Gene annotation; QUANTIFICATION; TRANSCRIPTOME; DISCOVERY; ALIGNMENT; HISAT;
D O I
10.1186/s12864-021-08038-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated-a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear. Results Using "mappability", a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically. Conclusions We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Stability of methods for differential expression analysis of RNA-seq data
    Lin, Bingqing
    Pang, Zhen
    BMC GENOMICS, 2019, 20 (1)
  • [32] Novel Data Transformations for RNA-seq Differential Expression Analysis
    Zeyu Zhang
    Danyang Yu
    Minseok Seo
    Craig P. Hersh
    Scott T. Weiss
    Weiliang Qiu
    Scientific Reports, 9
  • [33] Novel Data Transformations for RNA-seq Differential Expression Analysis
    Zhang, Zeyu
    Yu, Danyang
    Seo, Minseok
    Hersh, Craig P.
    Weiss, Scott T.
    Qiu, Weiliang
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [34] A comparison of methods for differential expression analysis of RNA-seq data
    Soneson, Charlotte
    Delorenzi, Mauro
    BMC BIOINFORMATICS, 2013, 14
  • [35] A comparison of methods for differential expression analysis of RNA-seq data
    Charlotte Soneson
    Mauro Delorenzi
    BMC Bioinformatics, 14
  • [36] Differential expression analysis of human endogenous retroviruses based on ENCODE RNA-seq data
    Kerstin Haase
    Anja Mösch
    Dmitrij Frishman
    BMC Medical Genomics, 8
  • [37] Differential expression analysis of human endogenous retroviruses based on ENCODE RNA-seq data
    Haase, Kerstin
    Moesch, Anja
    Frishman, Dmitrij
    BMC MEDICAL GENOMICS, 2015, 8
  • [38] Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq
    Williams, Claire R.
    Baccarella, Alyssa
    Parrish, Jay Z.
    Kim, Charles C.
    BMC BIOINFORMATICS, 2017, 18
  • [39] Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction
    Tong, Li
    Wu, Po-Yen
    Phan, John H.
    Hassazadeh, Hamid R.
    Tong, Weida
    Wang, May D.
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [40] Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq
    Claire R. Williams
    Alyssa Baccarella
    Jay Z. Parrish
    Charles C. Kim
    BMC Bioinformatics, 18