Accelerating K-mer Frequency Counting with GPU and Non-Volatile Memory

被引:7
作者
Cadenelli, Nicola [1 ,2 ]
Polo, Jorda [1 ]
Carrera, David [1 ,2 ]
机构
[1] BSC, Barcelona, Spain
[2] UPC, BarcelonaTECH, Barcelona, Spain
来源
2017 19TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS (HPCC) / 2017 15TH IEEE INTERNATIONAL CONFERENCE ON SMART CITY (SMARTCITY) / 2017 3RD IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (DSS) | 2017年
基金
欧洲研究理事会;
关键词
Scale-up; Acceleration; GPU; Non-Volatile Memory; NVM; Genomics; K-mer;
D O I
10.1109/HPCC-SmartCity-DSS.2017.57
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The emergence of Next Generation Sequencing (NGS) platforms has increased the throughput of genomic sequencing and in turn the amount of data that needs to be processed, requiring highly efficient computation for its analysis. In this context, modern architectures including accelerators and non-volatile memory are essential to enable the mass exploitation of these bioinformatics workloads. This paper presents a redesign of the main component of a state-of-the-art reference-free method for variant calling, SMUFIN, which has been adapted to make the most of GPUs and NVM devices. SMUFIN relies on counting the frequency of k-mers (substrings of length k) in DNA sequences, which also constitutes a well-known problem for many bioinformatics workloads, such as genome assembly. We propose techniques to improve the efficiency of k-mer counting and to scale-up workloads like SMUFIN that used to require 16 nodes of Marenostrum 3 to a single machine with a GPU and NVM drives. Results show that although the single machine is not able to improve the time to solution of 16 nodes, its CPU time is 7.5x shorter than the aggregate CPU time of the 16 nodes, with a reduction in energy consumption of 5.5x.
引用
收藏
页码:434 / 441
页数:8
相关论文
共 11 条
[1]  
[Anonymous], DSK K MER COUNTING V, DOI [10.1093/bioinformatics/btt020, DOI 10.1093/BIOINFORMATICS/BTT020]
[2]  
Chen F, 2011, INT S HIGH PERF COMP, P266, DOI 10.1109/HPCA.2011.5749735
[3]  
Datta S, 2009, ANN IEEE SYM FIELD P, P88, DOI 10.1109/FCCM.2009.15
[4]  
Krishnamurthy P, 2004, IEEE INT CONF ASAP, P365
[5]   A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes [J].
Kurtz, Stefan ;
Narechania, Apurva ;
Stein, Joshua C. ;
Ware, Doreen .
BMC GENOMICS, 2008, 9 (1) :517
[6]   Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction [J].
Laehnemann, David ;
Borkhardt, Arndt ;
McHardy, Alice Carolyn .
BRIEFINGS IN BIOINFORMATICS, 2016, 17 (01) :154-179
[7]  
Lin Ma, 2011, 2011 International Conference on Parallel Processing, P522, DOI 10.1109/ICPP.2011.27
[8]   DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI [J].
Liu, Yongchao ;
Schmidt, Bertil ;
Maskell, Douglas L. .
BMC BIOINFORMATICS, 2011, 12
[9]   Efficient counting of k-mers in DNA sequences using a bloom filter [J].
Melsted, Pall ;
Pritchard, Jonathan K. .
BMC BIOINFORMATICS, 2011, 12
[10]   Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads [J].
Moncunill, Valenti ;
Gonzalez, Santi ;
Bea, Silvia ;
Andrieux, Lise O. ;
Salaverria, Itziar ;
Royo, Cristina ;
Martinez, Laura ;
Puiggros, Montserrat ;
Segura-Wang, Maia ;
Stuetz, Adrian M. ;
Navarro, Alba ;
Royo, Romina ;
Gelpi, Josep L. ;
Gut, Ivo G. ;
Lopez-Otin, Carlos ;
Orozco, Modesto ;
Korbel, Jan ;
Campo, Elias ;
Puente, Xose S. ;
Torrents, David .
NATURE BIOTECHNOLOGY, 2014, 32 (11) :1106-1112