TALCO: Tiling Genome Sequence Alignment using Convergence of Traceback Pointers

被引:5
作者
Walia, Sumit [1 ]
Ye, Cheng [1 ]
Bera, Arkid [1 ]
Lodhavia, Dhruvi [1 ]
Turakhia, Yatish [1 ]
机构
[1] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
来源
2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024 | 2024年
关键词
SMITH-WATERMAN IMPLEMENTATION; ALGORITHM; DNA; ACCELERATOR; SEARCH;
D O I
10.1109/HPCA57654.2024.00044
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Pairwise sequence alignment is one of the most fundamental and computationally intensive steps in genome analysis. With the improving costs and throughput of third-generation sequencing technologies and the growing availability of whole-genome datasets, longer alignments are becoming more common in the field of bioinformatics. However, the high memory demands of long alignments create significant obstacles to hardware acceleration. Banding techniques allow recovering high-quality alignments with lower memory, but they also require more memory for long alignments than what is typically available on-chip in hardware accelerators. Recently, tiling-based hardware accelerators have made remarkable strides in accelerating sequence alignment, achieving three to four orders of magnitude improvement in alignment throughput over software tools without any restrictions on alignment length. However, it is crucial to note that existing tiling heuristics can cause the alignment quality to degrade, which is a critical concern for the wider adoption of accelerators in the field of bioinformatics. To address this issue, this paper describes TALCO - a novel method for tiling long sequence alignments, that, similar to prior tiling techniques, maintains a constant memory footprint during the acceleration step independent of alignment length. However, unlike previous techniques, TALCO also ensures optimal alignments under banding constraints. TALCO does this by leveraging the convergence of traceback paths beyond a tile to a single point on the boundary of that tile - a strategy that generalizes well to a broad set of sequence alignment algorithms. We demonstrate the advantages of TALCO by applying it to two different and widely-used banded sequence alignment algorithms, X-Drop and WFA-Adapt. To the best of our knowledge, this is the first time that a tiling technique is being applied to a non-classical algorithm for sequence alignment, such as WFA-Adapt. The TALCO tiling strategy is beneficial to both software and hardware. When implemented in software, the TALCO strategy reduces the memory requirements for X-Drop and WFA-Adapt algorithms by up to 39x and 57x, respectively, and when implemented as ASIC accelerator, it provides up to 1,900x and 2,000x improvement in alignment throughput/watt over CPU baselines implementing the same algorithms. Compared to state-of-the-art GPU and ASIC baselines implementing tiling heuristics, TALCO provides up to 50x and 1.1x improvement in alignment throughput, respectively, while also maintaining a higher alignment quality. Code availability: https://github.com/TurakhiaLab/TALCO.
引用
收藏
页码:498 / 514
页数:17
相关论文
共 112 条
[31]   abPOA: an SIMD-based C library for fast partial order alignment using adaptive band [J].
Gao, Yan ;
Liu, Yongzhuang ;
Ma, Yanmei ;
Liu, Bo ;
Wang, Yadong ;
Xing, Yi .
BIOINFORMATICS, 2021, 37 (15) :2209-2211
[32]   SegAlign: A Scalable GPU-Based Whole Genome Aligner [J].
Goenka, Sneha D. ;
Turakhia, Yatish ;
Paten, Benedict ;
Horowitz, Mark .
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
[33]   Long-read sequence assembly of the gorilla genome [J].
Gordon, David ;
Huddleston, John ;
Chaisson, Mark J. P. ;
Hill, Christopher M. ;
Kronenberg, Zev N. ;
Munson, Katherine M. ;
Malig, Maika ;
Raja, Archana ;
Fiddes, Ian ;
Hillier, LaDeana W. ;
Dunn, Christopher ;
Baker, Carl ;
Armstrong, Joel ;
Diekhans, Mark ;
Paten, Benedict ;
Shendure, Jay ;
Wilson, Richard K. ;
Haussler, David ;
Chin, Chen-Shan ;
Eichler, Evan E. .
SCIENCE, 2016, 352 (6281)
[34]   Ultrarapid Nanopore Genome Sequencing in a Critical Care Setting [J].
Gorzynski, John E. ;
Goenka, Sneha D. ;
Shafin, Kishwar ;
Jensen, Tanner D. ;
Fisk, Dianna G. ;
Grove, Megan E. ;
Spiteri, Elizabeth ;
Pesout, Trevor ;
Monlong, Jean ;
Baid, Gunjan ;
Bernstein, Jonathan A. ;
Ceresnak, Scott ;
Chang, Pi-Chuan ;
Christle, Jeffrey W. ;
Chubb, Henry ;
Dalton, Karen P. ;
Dunn, Kyla ;
Garalde, Daniel R. ;
Guillory, Joseph ;
Knowles, Joshua W. ;
Kolesnikov, Alexey ;
Ma, Michael ;
Moscarello, Tia ;
Nattestad, Maria ;
Perez, Marco ;
Ruzhnikov, Maura R. Z. ;
Samadi, Mehrzad ;
Setia, Ankit ;
Wright, Chris ;
Wusthoff, Courtney J. ;
Xiong, Katherine ;
Zhu, Tong ;
Jain, Miten ;
Sedlazeck, Fritz J. ;
Carroll, Andrew ;
Paten, Benedict ;
Ashley, Euan A. .
NEW ENGLAND JOURNAL OF MEDICINE, 2022, 386 (07) :700-+
[35]   AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES [J].
GOTOH, O .
JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) :705-708
[36]  
Gupta Saransh, 2019, I SYMPOS LOW POWER E, P1, DOI DOI 10.1109/islped.2019.8824830
[37]   OpenRAM: An Open-Source Memory Compiler Invited Paper [J].
Guthaus, Matthew R. ;
Stine, James E. ;
Ataei, Samira ;
Chen, Brian ;
Wu, Bin ;
Sarwar, Mehedi .
2016 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2016,
[38]   An FPGA Accelerator of the Wavefront Algorithm for Genomics Pairwise Alignment [J].
Haghi, Abbas ;
Marco-Sola, Santiago ;
Alvarez, Lluc ;
Diamantopoulos, Dionysios ;
Hagleitner, Christoph ;
Moreto, Miquel .
2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, :151-159
[39]   Genesis: A Hardware Acceleration Framework for Genomic Data Analysis [J].
Ham, Tae Jun ;
Bruns-Smith, David ;
Sweeney, Brendan ;
Lee, Yejin ;
Seo, Seong Hoon ;
Song, U. Gyeong ;
Oh, Young H. ;
Asanovic, Krste ;
Lee, Jae W. ;
Wills, Lisa Wu .
2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, :254-267
[40]   The Path to Personalized Medicine [J].
Hamburg, Margaret A. ;
Collins, Francis S. .
NEW ENGLAND JOURNAL OF MEDICINE, 2010, 363 (04) :301-304