Efficient RNA isoform identification and quantification from RNA-Seq data with network flows

被引:49
作者
Bernard, Elsa [1 ,2 ,3 ]
Jacob, Laurent [4 ]
Mairal, Julien [5 ]
Vert, Jean-Philippe [1 ,2 ,3 ]
机构
[1] Mines ParisTech, Ctr Computat Biol, F-77300 Fontainebleau, France
[2] Inst Curie, F-75248 Paris, France
[3] INSERM, U900, F-75248 Paris, France
[4] Univ Lyon 1, INRA, CNRS, Lab Biometrie & Biol Evolut,UMR5558, Villeurbanne, France
[5] INRIA Grenoble Rhone Alpes, LEAR Project Team, F-38330 Montbonnot St Martin, France
基金
美国国家科学基金会; 欧洲研究理事会;
关键词
ABUNDANCE ESTIMATION; TRANSCRIPTOME; EXPRESSION; SELECTION; ALGORITHM; DISCOVERY; GENOME; GRAPHS; LASSO;
D O I
10.1093/bioinformatics/btu317
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Several state-of-the-art methods for isoform identification and quantification are based on l(1)-regularized regression, such as the Lasso. However, explicitly listing the-possibly exponentially-large set of candidate transcripts is intractable for genes with many exons. For this reason, existing approaches using the l(1)-penalty are either restricted to genes with few exons or only run the regression algorithm on a small set of preselected isoforms. Results: We introduce a new technique called FlipFlop, which can efficiently tackle the sparse estimation problem on the full set of candidate isoforms by using network flow optimization. Our technique removes the need of a preselection step, leading to better isoform identification while keeping a low computational cost. Experiments with synthetic and real RNA-Seq data confirm that our approach is more accurate than alternative methods and one of the fastest available.
引用
收藏
页码:2447 / 2455
页数:9
相关论文
共 29 条
[1]  
Ahuja R., 1993, NETWORK FLOWS THEORY
[2]  
[Anonymous], 1998, Network optimization: Continuous and discrete models
[3]   MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples [J].
Behr, Jonas ;
Kahles, Andre ;
Zhong, Yi ;
Sreedharan, Vipin T. ;
Drewe, Philipp ;
Raetsch, Gunnar .
BIOINFORMATICS, 2013, 29 (20) :2529-2538
[4]   rQuant.web: a tool for RNA-Seq-based transcript quantitation [J].
Bohnert, Regina ;
Raetsch, Gunnar .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W348-W351
[5]   An efficient implementation of a scaling minimum-cost flow algorithm [J].
Goldberg, AV .
JOURNAL OF ALGORITHMS, 1997, 22 (01) :1-29
[6]   Modelling and simulating generic RNA-Seq experiments with the flux simulator [J].
Griebel, Thasso ;
Zacher, Benedikt ;
Ribeca, Paolo ;
Raineri, Emanuele ;
Lacroix, Vincent ;
Guigo, Roderic ;
Sammeth, Michael .
NUCLEIC ACIDS RESEARCH, 2012, 40 (20) :10073-10083
[7]  
Heber Steffen, 2002, Bioinformatics, V18 Suppl 1, pS181
[8]   A Robust Method for Transcript Quantification with RNA-Seq Data [J].
Huang, Yan ;
Hu, Yin ;
Jones, Corbin D. ;
MacLeod, James N. ;
Chiang, Derek Y. ;
Liu, Yufeng ;
Prins, Jan F. ;
Liu, Jinze .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (03) :167-187
[9]   Statistical inferences for isoform expression in RNA-Seq [J].
Jiang, Hui ;
Wong, Wing Hung .
BIOINFORMATICS, 2009, 25 (08) :1026-1032
[10]   Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation [J].
Li, Jingyi Jessica ;
Jiang, Ci-Ren ;
Brown, James B. ;
Huang, Haiyan ;
Bickel, Peter J. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (50) :19867-19872