Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data

被引:13
|
作者
Deng, Wenjiang [1 ]
Mou, Tian [1 ]
Kalari, Krishna R. [2 ]
Niu, Nifang [3 ]
Wang, Liewei [3 ]
Pawitan, Yudi [1 ]
Trung Nghia Vu [1 ]
机构
[1] Karolinska Inst, Dept Med Epidemiol & Biostat, S-17177 Stockholm, Sweden
[2] Mayo Clin, Dept Hlth Sci Res, Rochester, MN 55905 USA
[3] Mayo Clin, Dept Mol Pharmacol & Expt Therapeut, Rochester, MN 55905 USA
基金
瑞典研究理事会;
关键词
EXPRESSION; ALIGNMENT; KINASE; READS;
D O I
10.1093/bioinformatics/btz640
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. Results: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model X beta, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers X beta as a bilinear model with both X and beta unknown. Joint estimation of X and beta is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and beta. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.
引用
收藏
页码:805 / 812
页数:8
相关论文
共 50 条
  • [31] A Model to Estimate Alelic Imbalance using RNA-seq Data
    Ferguson, John P.
    Cho, Judy H.
    Palejev, Dean
    Zhao, Hongyu
    GENETIC EPIDEMIOLOGY, 2010, 34 (08) : 940 - 941
  • [32] Characterizing and annotating the genome using RNA-seq data
    Chen, Geng
    Shi, Tieliu
    Shi, Leming
    SCIENCE CHINA-LIFE SCIENCES, 2017, 60 (02) : 116 - 125
  • [33] deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data
    McPherson, Andrew
    Hormozdiari, Fereydoun
    Zayed, Abdalnasser
    Giuliany, Ryan
    Ha, Gavin
    Sun, Mark G. F.
    Griffith, Malachi
    Moussavi, Alireza Heravi
    Senz, Janine
    Melnyk, Nataliya
    Pacheco, Marina
    Marra, Marco A.
    Hirst, Martin
    Nielsen, Torsten O.
    Sahinalp, S. Cenk
    Huntsman, David
    Shah, Sohrab P.
    PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (05)
  • [34] scAPAtrap: identification and quantification of alternative polyadenylation sites from single-cell RNA-seq data
    Wu, Xiaohui
    Liu, Tao
    Ye, Congting
    Ye, Wenbin
    Ji, Guoli
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
  • [35] SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data
    Jia, Wenlong
    Qiu, Kunlong
    He, Minghui
    Song, Pengfei
    Zhou, Quan
    Zhou, Feng
    Yu, Yuan
    Zhu, Dandan
    Nickerson, Michael L.
    Wan, Shengqing
    Liao, Xiangke
    Zhu, Xiaoqian
    Peng, Shaoliang
    Li, Yingrui
    Wang, Jun
    Guo, Guangwu
    GENOME BIOLOGY, 2013, 14 (02):
  • [36] Quantifying circular RNA expression from RNA-seq data using model-based framework
    Li, Musheng
    Xie, Xueying
    Zhou, Jing
    Sheng, Mengying
    Yin, Xiaofeng
    Ko, Eun-A
    Zhou, Tong
    Gu, Wanjun
    BIOINFORMATICS, 2017, 33 (14) : 2131 - 2139
  • [37] Automated Isoform Diversity Detector (AIDD): a pipeline for investigating transcriptome diversity of RNA-seq data
    Plonski, Noel-Marie
    Johnson, Emily
    Frederick, Madeline
    Mercer, Heather
    Fraizer, Gail
    Meindl, Richard
    Casadesus, Gemma
    Piontkivska, Helen
    BMC BIOINFORMATICS, 2020, 21 (Suppl 18)
  • [38] RDDpred: a condition-specific RNA-editing prediction model from RNA-seq data
    Kim, Min-su
    Hur, Benjamin
    Kim, Sun
    BMC GENOMICS, 2016, 17
  • [39] Reliable Identification of Genomic Variants from RNA-Seq Data
    Piskol, Robert
    Ramaswami, Gokul
    Li, Jin Billy
    AMERICAN JOURNAL OF HUMAN GENETICS, 2013, 93 (04) : 641 - 651
  • [40] Recommendations for Accurate Resolution of Gene and Isoform Allele-Specific Expression in RNA-Seq Data
    Wood, David L. A.
    Nones, Katia
    Steptoe, Anita
    Christ, Angelika
    Harliwong, Ivon
    Newell, Felicity
    Bruxner, Timothy J. C.
    Miller, David
    Cloonan, Nicole
    Grimmond, Sean M.
    PLOS ONE, 2015, 10 (05):