Advancing next-generation sequencing data analytics with scalable distributed infrastructure

被引:2
作者
Kim, Joohyun [1 ]
Maddineni, Sharath [1 ]
Jha, Shantenu [1 ,2 ]
机构
[1] Louisiana State Univ, Ctr Computat & Technol, Baton Rouge, LA 70803 USA
[2] Rutgers State Univ, Piscataway, NJ 08854 USA
基金
美国国家科学基金会;
关键词
SHORT READS; CHIP-SEQ; GENOME; ALIGNMENT; ALGORITHMS; FRAMEWORK; TOOL;
D O I
10.1002/cpe.3013
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the emergence of popular next-generation sequencing (NGS)-based genome-wide protocols such as chromatin immunoprecipitation followed by sequencing (ChIP-Seq) and RNA-Seq, there is a growing need for research and infrastructure to support the requirement of effectively analyzing NGS data. Such research and infrastructure do not replace but complement algorithmic advances developments in analyzing NGS data. We present a runtime environment, Distributed Application Runtime Environment, that supports the scalable, flexible, and extensible composition of capabilities that cover the primary requirements of NGS-based analytics. In this work, we use BFAST as a representative stand-alone tool used for NGS data analysis and a ChIP-Seq pipeline as a representative pipeline-based approach to analyze the computational requirements. We analyze the performance characteristics of BFAST and understand its dependency on different input parameters. The computational complexity of genome-wide mapping using BFAST, amongst other factors, depends upon the size of a reference genome and the data size of short reads. Characterizing the performance suggests that the mapping benefits from both scaling-up (increased fine-grained parallelism) and scaling-out (task-level parallelism - local and distributed). For certain problem instances, scaling-out can be a more efficient approach than scaling-up. On the basis of investigations using the pipeline for ChIP-Seq, we also discuss the importance of dynamical execution of tasks. Copyright © 2013 John Wiley & Sons, Ltd.
引用
收藏
页码:894 / 906
页数:13
相关论文
共 50 条
[31]   3 Evaluation and Comparison of Multiple Aligners for Next-Generation Sequencing Data Analysis [J].
Shang, Jing ;
Zhu, Fei ;
Vongsangnak, Wanwipa ;
Tang, Yifei ;
Zhang, Wenyu ;
Shen, Bairong .
BIOMED RESEARCH INTERNATIONAL, 2014, 2014
[32]   SeqSQC: A Bioconductor Package for Evaluating the Sample Quality of Next-generation Sequencing Data [J].
Liu, Qian ;
Hu, Qiang ;
Yao, Song ;
Kwan, Marilyn L. ;
Roh, Janise M. ;
Zhao, Hua ;
Ambrosone, Christine B. ;
Kushi, Lawrence H. ;
Liu, Song ;
Zhu, Qianqian .
GENOMICS PROTEOMICS & BIOINFORMATICS, 2019, 17 (02) :211-218
[33]   A Highly Parallel Next-Generation DNA Sequencing Data Analysis Pipeline in Hadoop [J].
Aggour, Kareem S. ;
Kumar, Vijay S. ;
Sangurdekar, Dipen P. ;
Newberg, Lee A. ;
Kodira, Chinnappa D. .
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, :756-763
[34]   Efficiently identifying genome-wide changes with next-generation sequencing data [J].
Huang, Weichun ;
Umbach, David M. ;
Jordan, Nicole Vincent ;
Abell, Amy N. ;
Johnson, Gary L. ;
Li, Leping .
NUCLEIC ACIDS RESEARCH, 2011, 39 (19)
[35]   Analysis of error profiles in deep next-generation sequencing data [J].
Ma, Xiaotu ;
Shao, Ying ;
Tian, Liqing ;
Flasch, Diane A. ;
Mulder, Heather L. ;
Edmonson, Michael N. ;
Liu, Yu ;
Chen, Xiang ;
Newman, Scott ;
Nakitandwe, Joy ;
Li, Yongjin ;
Li, Benshang ;
Shen, Shuhong ;
Wang, Zhaoming ;
Shurtleff, Sheila ;
Robison, Leslie L. ;
Levy, Shawn ;
Easton, John ;
Zhang, Jinghui .
GENOME BIOLOGY, 2019, 20 (1)
[36]   MapReduce for accurate error correction of next-generation sequencing data [J].
Zhao, Liang ;
Chen, Qingfeng ;
Li, Wencui ;
Jiang, Peng ;
Wong, Limsoon ;
Li, Jinyan .
BIOINFORMATICS, 2017, 33 (23) :3844-3851
[37]   Variant Callers for Next-Generation Sequencing Data: A Comparison Study [J].
Liu, Xiangtao ;
Han, Shizhong ;
Wang, Zuoheng ;
Gelernter, Joel ;
Yang, Bao-Zhu .
PLOS ONE, 2013, 8 (09)
[38]   Next-generation sequencing data interpretation: enhancing reproducibility and accessibility [J].
Nekrutenko, Anton ;
Taylor, James .
NATURE REVIEWS GENETICS, 2012, 13 (09) :667-U93
[39]   Diminishing returns in next-generation sequencing (NGS) transcriptome data [J].
Lei, Rex ;
Ye, Kaixiong ;
Gu, Zhenglong ;
Sun, Xuepeng .
GENE, 2015, 557 (01) :82-87
[40]   Advancing Small-Molecule-Based Chemical Biology with Next-Generation Sequencing Technologies [J].
Anandhakumar, Chandran ;
Kizaki, Seiichiro ;
Bando, Toshikazu ;
Pandian, Ganesh N. ;
Sugiyama, Hiroshi .
CHEMBIOCHEM, 2015, 16 (01) :20-38