A Distributed System for Fast Alignment of Next-Generation Sequencing Data

被引:0
|
作者
Srimani, Jaydeep K. [1 ]
Wu, Po-Yen [1 ]
Phan, John H. [2 ,3 ]
Wang, May D. [2 ,3 ]
机构
[1] Georgia Inst Technol, Dept Elect & Comp Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Wallace H Coulter Biomed Engn Dept, Atlanta, GA 30332 USA
[3] Emory Univ, Atlanta, GA 30332 USA
基金
美国国家卫生研究院;
关键词
BOINC; distributed computing; next-generation sequencing; gene expression analysis; READ ALIGNMENT;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We developed a scalable distributed computing system using the Berkeley Open Interface for Network Computing (BOINC) to align next-generation sequencing (NGS) data quickly and accurately. NGS technology is emerging as a promising platform for gene expression analysis due to its high sensitivity compared to traditional genomic microarray technology. However, despite the benefits, NGS datasets can be prohibitively large, requiring significant computing resources to obtain sequence alignment results. Moreover, as the data and alignment algorithms become more prevalent, it will become necessary to examine the elTect of the multitude of alignment parameters on various NGS systems. We validate the distributed software system by (1) computing simple timing results to show the speed-up gained by using multiple computers, (2) optimizing alignment parameters using simulated NGS data, and (3) computing NGS expression levels for a single biological sample using optimal parameters and comparing these expression levels to that of a microarray sample. Results indicate that the distributed alignment system achieves approximately a linear speed-up and correctly distributes sequence data to and gathers alignment results from multiple compute clients.
引用
收藏
页码:579 / 584
页数:6
相关论文
共 50 条
  • [21] Focus on next-generation sequencing data analysis
    Rusk N.
    Nature Methods, 2009, 6 (Suppl 11) : S1 - S1
  • [22] Next-generation sequencing: adjusting to data overload
    Monya Baker
    Nature Methods, 2010, 7 : 495 - 499
  • [23] Next-generation sequencing: adjusting to data overload
    Baker, Monya
    NATURE METHODS, 2010, 7 (07) : 495 - 499
  • [24] Applications and data analysis of next-generation sequencing
    Vogl, Ina
    Benet-Pages, Anna
    Eck, Sebastian H.
    Kuhn, Marius
    Vosberg, Sebastian
    Greif, Philipp A.
    Metzeler, Klaus H.
    Biskup, Saskia
    Mueller-Reible, Clemens
    Klein, Hanns-Georg
    LABORATORIUMSMEDIZIN-JOURNAL OF LABORATORY MEDICINE, 2013, 37 (06): : 305 - 315
  • [25] Identification of indels in next-generation sequencing data
    Aakrosh Ratan
    Thomas L Olson
    Thomas P Loughran
    Webb Miller
    BMC Bioinformatics, 16
  • [26] Next-generation sequencing and the evolution of data sharing
    de Macena Sobreira, Nara Lygia
    Hamosh, Ada
    AMERICAN JOURNAL OF MEDICAL GENETICS PART A, 2021, 185 (09) : 2633 - 2635
  • [27] Assembly algorithms for next-generation sequencing data
    Miller, Jason R.
    Koren, Sergey
    Sutton, Granger
    GENOMICS, 2010, 95 (06) : 315 - 327
  • [28] Comparative analysis of algorithms for next-generation sequencing read alignment
    Ruffalo, Matthew
    LaFramboise, Thomas
    Koyutuerk, Mehmet
    BIOINFORMATICS, 2011, 27 (20) : 2790 - 2796
  • [29] Pathway analysis with next-generation sequencing data
    Zhao, Jinying
    Zhu, Yun
    Boerwinkle, Eric
    Xiong, Momiao
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2015, 23 (04) : 507 - 515
  • [30] Genotyping microsatellites in next-generation sequencing data
    Harriet Dashnow
    Susan Tan
    Debjani Das
    Simon Easteal
    Alicia Oshlack
    BMC Bioinformatics, 16