TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads

被引:28
|
作者
Nariai, Naoki [1 ]
Kojima, Kaname [1 ]
Mimori, Takahiro [1 ]
Sato, Yukuto [1 ]
Kawai, Yosuke [1 ]
Yamaguchi-Kabata, Yumi [1 ]
Nagasaki, Masao [1 ]
机构
[1] Tohoku Univ, Tohoku Med Megabank Org, Dept Integrat Genom, Aoba Ku, Sendai, Miyagi 9808573, Japan
来源
BMC GENOMICS | 2014年 / 15卷
关键词
REFERENCE GENOME; ALIGNMENT; GENE; QUANTIFICATION; REVEALS;
D O I
10.1186/1471-2164-15-S10-S5
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: High-throughput RNA sequencing (RNA-Seq) enables quantification and identification of transcripts at single-base resolution. Recently, longer sequence reads become available thanks to the development of new types of sequencing technologies as well as improvements in chemical reagents for the Next Generation Sequencers. Although several computational methods have been proposed for quantifying gene expression levels from RNA-Seq data, they are not sufficiently optimized for longer reads (e.g. > 250 bp). Results: We propose TIGAR2, a statistical method for quantifying transcript isoforms from fixed and variable length RNA-Seq data. Our method models substitution, deletion, and insertion errors of sequencers based on gapped-alignments of reads to the reference cDNA sequences so that sensitive read-aligners such as Bowtie2 and BWA-MEM are effectively incorporated in our pipeline. Also, a heuristic algorithm is implemented in variational Bayesian inference for faster computation. We apply TIGAR2 to both simulation data and real data of human samples and evaluate performance of transcript quantification with TIGAR2 in comparison to existing methods. Conclusions: TIGAR2 is a sensitive and accurate tool for quantifying transcript isoform abundances from RNA-Seq data. Our method performs better than existing methods for the fixed-length reads (100 bp, 250 bp, 500 bp, and 1000 bp of both single-end and paired-end) and variable-length reads, especially for reads longer than 250 bp.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering
    Lee, Soohyun
    Seo, Chae Hwa
    Alver, Burak Han
    Lee, Sanghyuk
    Park, Peter J.
    BMC BIOINFORMATICS, 2015, 16
  • [42] 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists
    Guo, Wenbin
    Tzioutziou, Nikoleta A.
    Stephen, Gordon
    Milne, Iain
    Calixto, Cristiane P. G.
    Waugh, Robbie
    Brown, John W. S.
    Zhang, Runxuan
    RNA BIOLOGY, 2021, 18 (11) : 1574 - 1587
  • [43] The transcriptome of Leishmania major in the axenic promastigote stage: transcript annotation and relative expression levels by RNA-seq
    Rastrojo, Alberto
    Carrasco-Ramiro, Fernando
    Martin, Diana
    Crespillo, Antonio
    Reguera, Rosa M.
    Aguado, Begona
    Requena, Jose M.
    BMC GENOMICS, 2013, 14
  • [44] Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates
    Tuerk, Andreas
    Wiktorin, Gregor
    Gueler, Serhat
    PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (05)
  • [45] Fast bootstrapping-based estimation of confidence intervals of expression levels and differential expression from RNA-Seq data
    Mandric, Igor
    Temate-Tiagueu, Yvette
    Shcheglova, Tatiana
    Al Seesi, Sahar
    Zelikovsky, Alex
    Mandoiu, Ion I.
    BIOINFORMATICS, 2017, 33 (20) : 3302 - 3304
  • [46] Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction
    Tong, Li
    Wu, Po-Yen
    Phan, John H.
    Hassazadeh, Hamid R.
    Tong, Weida
    Wang, May D.
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [47] Log-Sum Heuristic Recovery for Automated Isoform Discovery and Abundance Estimation from RNA-Seq Data
    Yang, Yang
    Deng, Yue
    Ji, Xiangyang
    Dai, Qionghai
    2015 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2015, : 599 - 603
  • [48] Accurate assembly of multi-end RNA-seq data with Scallop2
    Zhang, Qimin
    Shi, Qian
    Shao, Mingfu
    NATURE COMPUTATIONAL SCIENCE, 2022, 2 (03): : 148 - +
  • [49] Biological classification with RNA-seq data: Can alternatively spliced transcript expression enhance machine learning classifiers?
    Johnson, Nathan T.
    Dhroso, Andi
    Hughes, Katelyn J.
    Korkin, Dmitry
    RNA, 2018, 24 (09) : 1119 - 1132
  • [50] NURD: an implementation of a new method to estimate isoform expression from non-uniform RNA-seq data
    Ma, Xinyun
    Zhang, Xuegong
    BMC BIOINFORMATICS, 2013, 14