Spark-based parallelization of basic local alignment search tool

被引:2
作者
Wang H. [1 ,2 ]
Li L. [1 ,2 ]
Zhou C. [1 ,2 ]
Lin H. [1 ,2 ]
Deng D. [1 ,2 ]
机构
[1] College of Data Science and Application, Inner Mongolia University of Technology, Hohhot
[2] Inner Mongolia Autonomous Region Engineering and, Technology Research Center of Big Data Based Software Service, Inner Mongolia University of Technology, Hohhot
关键词
Basic local alignment search tool; Parallelization; Sequence alignment; Spark; Speedup;
D O I
10.7546/ijba.2020.24.1.000767
中图分类号
学科分类号
摘要
Sequence alignment is a key link of bioinformatics analysis. The basic local alignment search tool (BLAST) is a popular sequence alignment algorithm with high accuracy. However, the BLAST is inefficient in comparing and analyzing a massive amount of gene sequencing data. To solve the problem, this paper designs a distributed parallel BLAST method called SparkBLAST, based on the big data technique Spark. Under the in-memory computing framework Spark, SparkBLAST identifies the task of sequence alignment, divides the sequence dataset, and compares the sequence data. The Apache Hadoop YARN was adopted to task scheduling and resource allocation. Finally, the SparkBLAST was compared with standalone BLAST through experiments. The results show that SparkBLAST realized the speedup ratio of 3.95, without sacrificing the accuracy. In other words, SparkBLAST greatly outshines the standalone BLAST in calculation efficiency. The research findings provide bioinformatics researchers a highly efficient tool for sequence alignment. © 2020 by the authors.
引用
收藏
页码:87 / 98
页数:11
相关论文
共 22 条
[21]  
Yang X.L., Liu Y.L., Yuan C.F., Huang Y.H., Parallelization of BLAST with MapReduce for Long Sequence Alignment, Proceedings of Fourth International Symposium on Parallel Architectures, Algorithms and Programming, pp. 241-246, (2011)
[22]  
Ye W., Chen Y., Zhang Y., Xu Y., H-BLAST: A Fast Protein Sequence Alignment Toolkit on Heterogeneous Computers with GPUs, Bioinformatics, 33, 8, pp. 1130-1138, (2017)