Advancing next-generation sequencing data analytics with scalable distributed infrastructure

被引:2
|
作者
Kim, Joohyun [1 ]
Maddineni, Sharath [1 ]
Jha, Shantenu [1 ,2 ]
机构
[1] Louisiana State Univ, Ctr Computat & Technol, Baton Rouge, LA 70803 USA
[2] Rutgers State Univ, Piscataway, NJ 08854 USA
来源
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2014年 / 26卷 / 04期
基金
美国国家科学基金会;
关键词
SHORT READS; CHIP-SEQ; GENOME; ALIGNMENT; ALGORITHMS; FRAMEWORK; TOOL;
D O I
10.1002/cpe.3013
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the emergence of popular next-generation sequencing (NGS)-based genome-wide protocols such as chromatin immunoprecipitation followed by sequencing (ChIP-Seq) and RNA-Seq, there is a growing need for research and infrastructure to support the requirement of effectively analyzing NGS data. Such research and infrastructure do not replace but complement algorithmic advances developments in analyzing NGS data. We present a runtime environment, Distributed Application Runtime Environment, that supports the scalable, flexible, and extensible composition of capabilities that cover the primary requirements of NGS-based analytics. In this work, we use BFAST as a representative stand-alone tool used for NGS data analysis and a ChIP-Seq pipeline as a representative pipeline-based approach to analyze the computational requirements. We analyze the performance characteristics of BFAST and understand its dependency on different input parameters. The computational complexity of genome-wide mapping using BFAST, amongst other factors, depends upon the size of a reference genome and the data size of short reads. Characterizing the performance suggests that the mapping benefits from both scaling-up (increased fine-grained parallelism) and scaling-out (task-level parallelism - local and distributed). For certain problem instances, scaling-out can be a more efficient approach than scaling-up. On the basis of investigations using the pipeline for ChIP-Seq, we also discuss the importance of dynamical execution of tasks. Copyright © 2013 John Wiley & Sons, Ltd.
引用
收藏
页码:894 / 906
页数:13
相关论文
共 50 条
  • [1] Visual programming for next-generation sequencing data analytics
    Franco Milicchio
    Rebecca Rose
    Jiang Bian
    Jae Min
    Mattia Prosperi
    BioData Mining, 9
  • [2] Visual programming for next-generation sequencing data analytics
    Milicchio, Franco
    Rose, Rebecca
    Bian, Jiang
    Min, Jae
    Prosperi, Mattia
    BIODATA MINING, 2016, 9
  • [3] Next-generation sequencing revolution through big data analytics
    Tripathi, Rashmi
    Sharma, Pawan
    Chakraborty, Pavan
    Varadwaj, Pritish Kumar
    FRONTIERS IN LIFE SCIENCE, 2016, 9 (02): : 119 - 149
  • [4] A Distributed System for Fast Alignment of Next-Generation Sequencing Data
    Srimani, Jaydeep K.
    Wu, Po-Yen
    Phan, John H.
    Wang, May D.
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 579 - 584
  • [5] Next-Generation Analytics for Omics Data
    Li, Jun
    Chen, Hu
    Wang, Yumeng
    Chen, Mei-Ju May
    Liang, Han
    CANCER CELL, 2021, 39 (01) : 3 - 6
  • [6] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    BIOINFORMATICS, 2023, 39 (01)
  • [7] OncOS: Scalable and accurate next-generation sequencing analytics for precision oncology and personalized patient care
    Thompson, J. S.
    Farmery, J. H. R.
    Dobson, H.
    Frost, S.
    Cassidy, J. W.
    Patel, N.
    Thompson, H.
    Clifford, H. W.
    ANNALS OF ONCOLOGY, 2019, 30 : 583 - 583
  • [8] Next-Generation Sequencing Data Analysis
    Chowdhry, Amit K.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2024,
  • [9] Indexing Next-Generation Sequencing data
    Jalili, Vahid
    Matteucci, Matteo
    Masseroli, Marco
    Ceri, Stefano
    INFORMATION SCIENCES, 2017, 384 : 90 - 109
  • [10] Next-generation infrastructure for next-generation people
    Tyler N.
    Proceedings of the Institution of Civil Engineers: Smart Infrastructure and Construction, 2021, 173 (02) : 24 - 28