Impact of human gene annotations on RNA-seq differential expression analysis

被引:6
|
作者
Hamaguchi, Yu [1 ]
Zeng, Chao [1 ,2 ]
Hamada, Michiaki [1 ,2 ,3 ,4 ]
机构
[1] Waseda Univ, Fac Sci & Engn, Shinjuku Ku, 55N-06-10,3-4-1 Okubo, Tokyo 1698555, Japan
[2] Waseda Univ, AIST, Computat Bio Big Data Open Innovat Lab CBBD OIL, Shinjuku Ku, 3-4-1 Okubo, Tokyo 1698555, Japan
[3] Waseda Univ, Inst Med Oriented Struct Biol, Shinjuku Ku, 2-2 Wakamatsu Cho, Tokyo 1628480, Japan
[4] Nippon Med Sch, Grad Sch Med, Bunkyo Ku, 1-1-5 Sendagi, Tokyo 1138602, Japan
关键词
RNA-seq; Differential expression analysis; Benchmarking; Gene annotation; QUANTIFICATION; TRANSCRIPTOME; DISCOVERY; ALIGNMENT; HISAT;
D O I
10.1186/s12864-021-08038-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated-a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear. Results Using "mappability", a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically. Conclusions We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Gene dispersion is the key determinant of the read count bias in differential expression analysis of RNA-seq data
    Yoon, Sora
    Nam, Dougu
    BMC GENOMICS, 2017, 18
  • [22] Recent developments and future directions in meta-analysis of differential gene expression in livestock RNA-Seq
    Keel, Brittney N. N.
    Lindholm-Perry, Amanda K. K.
    FRONTIERS IN GENETICS, 2022, 13
  • [23] Gene dispersion is the key determinant of the read count bias in differential expression analysis of RNA-seq data
    Sora Yoon
    Dougu Nam
    BMC Genomics, 18
  • [24] Approaches for sRNA Analysis of Human RNA-Seq Data: Comparison, Benchmarking
    Bezuglov, Vitalik
    Stupnikov, Alexey
    Skakov, Ivan
    Shtratnikova, Victoria
    Pilsner, J. Richard
    Suvorov, Alexander
    Sergeyev, Oleg
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2023, 24 (04)
  • [25] A comparison of methods for differential expression analysis of RNA-seq data
    Soneson, Charlotte
    Delorenzi, Mauro
    BMC BIOINFORMATICS, 2013, 14
  • [26] A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data
    Liu, Kefei
    Ye, Jieping
    Yang, Yang
    Shen, Li
    Jiang, Hui
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (02) : 442 - 454
  • [27] An iteration normalization and test method for differential expression analysis of RNA-seq data
    Zhou, Yan
    Lin, Nan
    Zhang, Baoxue
    BIODATA MINING, 2014, 7
  • [28] RNAflow: An Effective and Simple RNA-Seq Differential Gene Expression Pipeline Using Nextflow
    Lataretu, Marie
    Hoelzer, Martin
    GENES, 2020, 11 (12) : 1 - 17
  • [29] A general workflow for differential expression analysis of RNA-seq and introductions on related tools
    Zhang, Zhong
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2016), 2016, 50 : 328 - 338
  • [30] Error estimates for the analysis of differential expression from RNA-seq count data
    Burden, Conrad J.
    Qureshi, Sumaira E.
    Wilson, Susan R.
    PEERJ, 2014, 2