BOAssembler: A Bayesian Optimization Framework to Improve RNA-Seq Assembly Performance

被引:1
作者
Mao, Shunfu [1 ]
Jiang, Yihan [1 ]
Mathew, Edwin Basil [1 ]
Kannan, Sreeram [1 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
来源
ALGORITHMS FOR COMPUTATIONAL BIOLOGY (ALCOB 2020) | 2020年 / 12099卷
关键词
RNA-Seq; Assembly; Bayesian Optimization; TRANSCRIPTOME; QUANTIFICATION;
D O I
10.1007/978-3-030-42266-0_15
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
High throughput sequencing of RNA (RNA-Seq) can provide us with millions of short fragments of RNA transcripts from a sample. How to better recover the original RNA transcripts from those fragments (RNA-Seq assembly) is still a difficult task. For example, RNASeq assembly tools typically require hyper-parameter tuning to achieve good performance for particular datasets. This kind of tuning is usually unintuitive and time-consuming. Consequently, users often resort to default parameters, which do not guarantee consistent good performance for various datasets. Results: Here we propose BOAssembler, a framework that enables end-to-end automatic tuning of RNA-Seq assemblers, based on Bayesian Optimization principles. Experiments show this data-driven approach is effective to improve the overall assembly performance. The approach would be helpful for downstream (e.g. gene, protein, cell) analysis, and more broadly, for future bioinformatics benchmark studies.
引用
收藏
页码:188 / 197
页数:10
相关论文
共 20 条
[1]   Characterization of the human ESC transcriptome by hybrid sequencing [J].
Au, Kin Fai ;
Sebastiano, Vittorio ;
Afshar, Pegah Tootoonchi ;
Durruthy, Jens Durruthy ;
Lee, Lawrence ;
Williams, Brian A. ;
van Bakel, Harm ;
Schadt, Eric E. ;
Reijo-Pera, Renee A. ;
Underwood, Jason G. ;
Wong, Wing Hung .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (50) :E4821-E4830
[2]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[3]  
Brochu E, 2010, A tutorial on Bayesian optimization of expensive cost functions, DOI DOI 10.48550/ARXIV.1012.2599
[4]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[5]  
Frazier P.I., 2018, arXiv
[6]  
Frazier P. I., 2018, TUTORIAL BAYESIAN OP
[7]   Full-length transcriptome assembly from RNA-Seq data without a reference genome [J].
Grabherr, Manfred G. ;
Haas, Brian J. ;
Yassour, Moran ;
Levin, Joshua Z. ;
Thompson, Dawn A. ;
Amit, Ido ;
Adiconis, Xian ;
Fan, Lin ;
Raychowdhury, Raktima ;
Zeng, Qiandong ;
Chen, Zehua ;
Mauceli, Evan ;
Hacohen, Nir ;
Gnirke, Andreas ;
Rhind, Nicholas ;
di Palma, Federica ;
Birren, Bruce W. ;
Nusbaum, Chad ;
Lindblad-Toh, Kerstin ;
Friedman, Nir ;
Regev, Aviv .
NATURE BIOTECHNOLOGY, 2011, 29 (07) :644-U130
[8]   Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data [J].
Hayer, Katharina E. ;
Pizarro, Angel ;
Lahens, Nicholas F. ;
Hogenesch, John B. ;
Grant, Gregory R. .
BIOINFORMATICS, 2015, 31 (24) :3938-3945
[9]  
Jiang Y., 2018, ARXIV PREPRINT ARXIV
[10]  
Kannan S., 2016, BIORXIV, DOI 10.1101/039230