TALCO: Tiling Genome Sequence Alignment using Convergence of Traceback Pointers

被引:5
作者
Walia, Sumit [1 ]
Ye, Cheng [1 ]
Bera, Arkid [1 ]
Lodhavia, Dhruvi [1 ]
Turakhia, Yatish [1 ]
机构
[1] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
来源
2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024 | 2024年
关键词
SMITH-WATERMAN IMPLEMENTATION; ALGORITHM; DNA; ACCELERATOR; SEARCH;
D O I
10.1109/HPCA57654.2024.00044
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Pairwise sequence alignment is one of the most fundamental and computationally intensive steps in genome analysis. With the improving costs and throughput of third-generation sequencing technologies and the growing availability of whole-genome datasets, longer alignments are becoming more common in the field of bioinformatics. However, the high memory demands of long alignments create significant obstacles to hardware acceleration. Banding techniques allow recovering high-quality alignments with lower memory, but they also require more memory for long alignments than what is typically available on-chip in hardware accelerators. Recently, tiling-based hardware accelerators have made remarkable strides in accelerating sequence alignment, achieving three to four orders of magnitude improvement in alignment throughput over software tools without any restrictions on alignment length. However, it is crucial to note that existing tiling heuristics can cause the alignment quality to degrade, which is a critical concern for the wider adoption of accelerators in the field of bioinformatics. To address this issue, this paper describes TALCO - a novel method for tiling long sequence alignments, that, similar to prior tiling techniques, maintains a constant memory footprint during the acceleration step independent of alignment length. However, unlike previous techniques, TALCO also ensures optimal alignments under banding constraints. TALCO does this by leveraging the convergence of traceback paths beyond a tile to a single point on the boundary of that tile - a strategy that generalizes well to a broad set of sequence alignment algorithms. We demonstrate the advantages of TALCO by applying it to two different and widely-used banded sequence alignment algorithms, X-Drop and WFA-Adapt. To the best of our knowledge, this is the first time that a tiling technique is being applied to a non-classical algorithm for sequence alignment, such as WFA-Adapt. The TALCO tiling strategy is beneficial to both software and hardware. When implemented in software, the TALCO strategy reduces the memory requirements for X-Drop and WFA-Adapt algorithms by up to 39x and 57x, respectively, and when implemented as ASIC accelerator, it provides up to 1,900x and 2,000x improvement in alignment throughput/watt over CPU baselines implementing the same algorithms. Compared to state-of-the-art GPU and ASIC baselines implementing tiling heuristics, TALCO provides up to 50x and 1.1x improvement in alignment throughput, respectively, while also maintaining a higher alignment quality. Code availability: https://github.com/TurakhiaLab/TALCO.
引用
收藏
页码:498 / 514
页数:17
相关论文
共 112 条
[91]   Fast and sensitive mapping of nanopore sequencing reads with GraphMap [J].
Sovic, Ivan ;
Sikic, Mile ;
Wilm, Andreas ;
Fenlon, Shannon Nicole ;
Chen, Swaine ;
Nagarajan, Niranjan .
NATURE COMMUNICATIONS, 2016, 7
[92]   FreePDK: An open-source variation-aware design kit [J].
Stine, James E. ;
Castellanos, Ivan ;
Wood, Michael ;
Henson, Jeff ;
Love, Fred ;
Davis, W. Rhett ;
Franzon, Paul D. ;
Bucher, Michael ;
Basavarajaiah, Sunil ;
Oh, Julie ;
Jenkal, Ravi .
2007 IEEE INTERNATIONAL CONFERENCE ON MICROELECTRONIC SYSTEMS EDUCATION, PROCEEDINGS, 2007, :173-+
[93]  
Van der Auwera Geraldine A, 2013, Curr Protoc Bioinformatics, V43, DOI [10.1002/0471250953.bi1201s43, 10.1002/0471250953.bi1110s43]
[94]  
Stratton M, 2010, A comprehensive catalogue of somatic mutations from a human cancer genome
[95]   Accelerated Seeding for Genome Sequence Alignment with Enumerated Radix Trees [J].
Subramaniyan, Arun ;
Wadden, Jack ;
Goliya, Kush ;
Ozog, Nathan ;
Wu, Xiao ;
Narayanasamy, Satish ;
Blaauw, David ;
Das, Reetuparna .
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, :388-401
[96]   GenomicsBench: A Benchmark Suite for Genomics [J].
Subramaniyan, Arun ;
Gu, Yufeng ;
Dunn, Timothy ;
Paul, Somnath ;
Vasimuddin, Md ;
Misra, Sanchit ;
Blaauw, David ;
Narayanasamy, Satish ;
Das, Reetuparna .
2021 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2021), 2021, :1-12
[97]   Introducing difference recurrence relations for faster semi-global alignment of long sequences [J].
Suzuki, Hajime ;
Kasahara, Masahiro .
BMC BIOINFORMATICS, 2018, 19
[98]   Adapting the GACT-X Aligner to Accelerate Minimap2 in an FPGA Cloud Instance [J].
Teng, Carolina ;
Achjian, Renan Weege ;
Wang, Jiang Chau ;
Fonseca, Fernando Josepetti .
APPLIED SCIENCES-BASEL, 2023, 13 (07)
[99]   Darwin: A Genomics Coprocessor [J].
Turakhia, Yatish ;
Bejerano, Gill ;
Dally, William J. .
IEEE MICRO, 2019, 39 (03) :29-37
[100]   Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup [J].
Turakhia, Yatish ;
Goenka, Sneha D. ;
Bejerano, Gill ;
Dally, William J. .
2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, :359-372