Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data

被引：13

作者：

Deng, Wenjiang ^{[1
]}

Mou, Tian ^{[1
]}

Kalari, Krishna R. ^{[2
]}

Niu, Nifang ^{[3
]}

Wang, Liewei ^{[3
]}

Pawitan, Yudi ^{[1
]}

Trung Nghia Vu ^{[1
]}

机构：

[1] Karolinska Inst, Dept Med Epidemiol & Biostat, S-17177 Stockholm, Sweden

[2] Mayo Clin, Dept Hlth Sci Res, Rochester, MN 55905 USA

[3] Mayo Clin, Dept Mol Pharmacol & Expt Therapeut, Rochester, MN 55905 USA

来源：

BIOINFORMATICS | 2020年 / 36卷 / 03期

基金：

瑞典研究理事会;

关键词：

EXPRESSION; ALIGNMENT; KINASE; READS;

D O I：

10.1093/bioinformatics/btz640

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. Results: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model X beta, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers X beta as a bilinear model with both X and beta unknown. Joint estimation of X and beta is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and beta. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.

引用

页码：805 / 812

页数：8

共 50 条

[31] A Model to Estimate Alelic Imbalance using RNA-seq Data
Ferguson, John P.
Cho, Judy H.
Palejev, Dean
Zhao, Hongyu
GENETIC EPIDEMIOLOGY, 2010, 34 (08) : 940 - 941
[32] Characterizing and annotating the genome using RNA-seq data
Chen, Geng
Shi, Tieliu
Shi, Leming
SCIENCE CHINA-LIFE SCIENCES, 2017, 60 (02) : 116 - 125
[33] deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data
McPherson, Andrew
Hormozdiari, Fereydoun
Zayed, Abdalnasser
Giuliany, Ryan
Ha, Gavin
Sun, Mark G. F.
Griffith, Malachi
Moussavi, Alireza Heravi
Senz, Janine
Melnyk, Nataliya
Pacheco, Marina
Marra, Marco A.
Hirst, Martin
Nielsen, Torsten O.
Sahinalp, S. Cenk
Huntsman, David
Shah, Sohrab P.
PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (05)
[34] scAPAtrap: identification and quantification of alternative polyadenylation sites from single-cell RNA-seq data
Wu, Xiaohui
Liu, Tao
Ye, Congting
Ye, Wenbin
Ji, Guoli
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
[35] SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data
Jia, Wenlong
Qiu, Kunlong
He, Minghui
Song, Pengfei
Zhou, Quan
Zhou, Feng
Yu, Yuan
Zhu, Dandan
Nickerson, Michael L.
Wan, Shengqing
Liao, Xiangke
Zhu, Xiaoqian
Peng, Shaoliang
Li, Yingrui
Wang, Jun
Guo, Guangwu
GENOME BIOLOGY, 2013, 14 (02):
[36] Quantifying circular RNA expression from RNA-seq data using model-based framework
Li, Musheng
Xie, Xueying
Zhou, Jing
Sheng, Mengying
Yin, Xiaofeng
Ko, Eun-A
Zhou, Tong
Gu, Wanjun
BIOINFORMATICS, 2017, 33 (14) : 2131 - 2139
[37] Automated Isoform Diversity Detector (AIDD): a pipeline for investigating transcriptome diversity of RNA-seq data
Plonski, Noel-Marie
Johnson, Emily
Frederick, Madeline
Mercer, Heather
Fraizer, Gail
Meindl, Richard
Casadesus, Gemma
Piontkivska, Helen
BMC BIOINFORMATICS, 2020, 21 (Suppl 18)
[38] RDDpred: a condition-specific RNA-editing prediction model from RNA-seq data
Kim, Min-su
Hur, Benjamin
Kim, Sun
BMC GENOMICS, 2016, 17
[39] Reliable Identification of Genomic Variants from RNA-Seq Data
Piskol, Robert
Ramaswami, Gokul
Li, Jin Billy
AMERICAN JOURNAL OF HUMAN GENETICS, 2013, 93 (04) : 641 - 651
[40] Recommendations for Accurate Resolution of Gene and Isoform Allele-Specific Expression in RNA-Seq Data
Wood, David L. A.
Nones, Katia
Steptoe, Anita
Christ, Angelika
Harliwong, Ivon
Newell, Felicity
Bruxner, Timothy J. C.
Miller, David
Cloonan, Nicole
Grimmond, Sean M.
PLOS ONE, 2015, 10 (05):

← 1 2 3 4 5 →