A Distributed System for Fast Alignment of Next-Generation Sequencing Data

被引:0
作者
Srimani, Jaydeep K. [1 ]
Wu, Po-Yen [1 ]
Phan, John H. [2 ,3 ]
Wang, May D. [2 ,3 ]
机构
[1] Georgia Inst Technol, Dept Elect & Comp Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Wallace H Coulter Biomed Engn Dept, Atlanta, GA 30332 USA
[3] Emory Univ, Atlanta, GA 30332 USA
来源
2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW) | 2010年
基金
美国国家卫生研究院;
关键词
BOINC; distributed computing; next-generation sequencing; gene expression analysis; READ ALIGNMENT;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We developed a scalable distributed computing system using the Berkeley Open Interface for Network Computing (BOINC) to align next-generation sequencing (NGS) data quickly and accurately. NGS technology is emerging as a promising platform for gene expression analysis due to its high sensitivity compared to traditional genomic microarray technology. However, despite the benefits, NGS datasets can be prohibitively large, requiring significant computing resources to obtain sequence alignment results. Moreover, as the data and alignment algorithms become more prevalent, it will become necessary to examine the elTect of the multitude of alignment parameters on various NGS systems. We validate the distributed software system by (1) computing simple timing results to show the speed-up gained by using multiple computers, (2) optimizing alignment parameters using simulated NGS data, and (3) computing NGS expression levels for a single biological sample using optimal parameters and comparing these expression levels to that of a microarray sample. Results indicate that the distributed alignment system achieves approximately a linear speed-up and correctly distributes sequence data to and gathers alignment results from multiple compute clients.
引用
收藏
页码:579 / 584
页数:6
相关论文
共 18 条
[1]  
Anderson D.P., 2004, GRID 04, P4, DOI [10.1109/grid.2004.14, DOI 10.1109/GRID.2004.14]
[2]   A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling [J].
Bradford, James R. ;
Hey, Yvonne ;
Yates, Tim ;
Li, Yaoyong ;
Pepper, Stuart D. ;
Miller, Crispin J. .
BMC GENOMICS, 2010, 11
[3]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[4]   Systematic comparison of microarray profiling, real-time PCR, and next-generation sequencing technologies for measuring differential microRNA expression [J].
Git, Anna ;
Dvinge, Heidi ;
Salmon-Divon, Mali ;
Osborne, Michelle ;
Kutter, Claudia ;
Hadfield, James ;
Bertone, Paul ;
Caldas, Carlos .
RNA, 2010, 16 (05) :991-1006
[5]   Mammalian microRNAs predominantly act to decrease target mRNA levels [J].
Guo, Huili ;
Ingolia, Nicholas T. ;
Weissman, Jonathan S. ;
Bartel, David P. .
NATURE, 2010, 466 (7308) :835-U66
[6]   Evaluation of next generation sequencing platforms for population targeted sequencing studies [J].
Harismendy, Olivier ;
Ng, Pauline C. ;
Strausberg, Robert L. ;
Wang, Xiaoyun ;
Stockwell, Timothy B. ;
Beeson, Karen Y. ;
Schork, Nicholas J. ;
Murray, Sarah S. ;
Topol, Eric J. ;
Levy, Samuel ;
Frazer, Kelly A. .
GENOME BIOLOGY, 2009, 10 (03)
[7]   Fast and accurate long-read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2010, 26 (05) :589-595
[8]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[9]   A distributed approach for a multiple sequence alignment algorithm using a parallel virtual machine [J].
Lopes, Heitor S. ;
Moritz, Guilherme L. .
2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, :2843-2846
[10]   The impact of next-generation sequencing technology on genetics [J].
Mardis, Elaine R. .
TRENDS IN GENETICS, 2008, 24 (03) :133-141