GenStore: A High-Performance In-Storage Processing System for Genome Sequence Analysis

被引:35
|
作者
Ghiasi, Nika Mansouri [1 ]
Park, Jisung [1 ]
Mustafa, Harun [1 ]
Kim, Jeremie [1 ]
Olgun, Ataberk [1 ]
Gollwitzer, Arvid [1 ]
Cali, Damla Senol [2 ]
Firtina, Can [1 ]
Mao, Haiyu [1 ]
Alserr, Nour Almadhoun [1 ]
Ausavarungnirun, Rachata [3 ]
Vijaykumar, Nandita [4 ]
Alser, Mohammed [1 ]
Mutlu, Onur [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Bionano Genom, San Diego, CA USA
[3] KMUTNB, Bangkok, Thailand
[4] Univ Toronto, Toronto, ON, Canada
来源
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS | 2022年
基金
新加坡国家研究基金会;
关键词
Read Mapping; Filtering; Genomics; Storage; Near-Data Processing; GENERATION; ALGORITHM; ALIGNMENT; SEARCH; MODEL;
D O I
10.1145/3503222.3507702
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Read mapping is a fundamental step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). Read mapping is costly because it needs to perform approximate string matching (ASM) on large amounts of data. To address the computational challenges in genome analysis, many prior works propose various approaches such as accurate filters that select the reads within a dataset of genomic reads (called a read set) that must undergo expensive computation, efficient heuristics, and hardware acceleration. While effective at reducing the amount of expensive computation, all such approaches still require the costly movement of a large amount of data from storage to the rest of the system, which can significantly lower the end-to-end performance of read mapping in conventional and emerging genomics systems. We propose GenStore, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. GenStore leverages hardware/software co-design to address the challenges of in-storage processing, supporting reads with 1) different properties such as read lengths and error rates, which highly depend on the sequencing technology, and 2) different degrees of genetic variation compared to the reference genome, which highly depends on the genomes that are being compared. Through rigorous analysis of read mapping processes of reads with different properties and degrees of genetic variation, we meticulously design low-cost hardware accelerators and data/computation flows inside a NAND flashbased solid-state drive (SSD). Our evaluation using a wide range of real genomic datasets shows that GenStore, when implemented in three modern NAND flash-based SSDs, significantly improves the read mapping performance of state-of-the-art software (hardware) baselines by 2.07-6.05x (1.52-3.32x) for read sets with high similarity to the reference genome and 1.45-33.63x (2.70-19.2x) for read sets with low similarity to the reference genome.
引用
收藏
页码:635 / 654
页数:20
相关论文
共 50 条
  • [41] Analysis of potential and free furfural compounds in milk-based formulae by high-performance liquid chromatography -: Evolution during storage
    Chávez-Servín, JL
    Castellote, AI
    López-Sabater, MC
    JOURNAL OF CHROMATOGRAPHY A, 2005, 1076 (1-2) : 133 - 140
  • [42] Parallel Colt: A High-Performance Java']Java Library for Scientific Computing and Image Processing
    Wendykier, Piotr
    Nagy, James G.
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2010, 37 (03):
  • [43] Thermo-economic optimization of a high-performance CCHP system integrated with compressed air energy storage (CAES) and carbon dioxide ejector cooling system
    Sadeghi, Shayan
    Ahmadi, Pouria
    SUSTAINABLE ENERGY TECHNOLOGIES AND ASSESSMENTS, 2021, 45
  • [44] Formation of an Individual Modeling Environment in a Hybrid High-Performance Computing System
    Volovich K.I.
    Denisov S.A.
    Malkovsky S.I.
    Volovich, K.I. (KVolovich@frccsc.ru); Denisov, S.A. (SDenisov@frccsc.ru); Malkovsky, S.I. (sergey.malkovsky@ccfebras.ru), 1600, Pleiades journals (49): : 580 - 583
  • [45] High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function
    Gao, Mu
    Lund-Andersen, Peik
    Morehead, Alex
    Mahmud, Sajid
    Chen, Chen
    Chen, Xiao
    Giri, Nabin
    Roy, Raj S.
    Quadir, Farhan
    Effler, T. Chad
    Prout, Ryan
    Abraham, Subil
    Elwasif, Wael
    Haas, N. Quentin
    Skolnick, Jeffrey
    Cheng, Jianlin
    Sedova, Ada
    PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021), 2021, : 46 - 57
  • [46] Nanopositioning System With Force Feedback for High-Performance Tracking and Vibration Control
    Fleming, Andrew J.
    IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2010, 15 (03) : 433 - 447
  • [47] A Sequence Optimization Strategy for Chromatographic Separation in Reversed-Phase High-Performance Liquid Chromatography
    Du, Xueling
    Li, Ye
    Yuan, Qipeng
    AICHE JOURNAL, 2010, 56 (02) : 371 - 380
  • [48] High-performance hybrid photovoltaic -battery system based on quasi-Z-source inverter: application in microgrids
    Khajesalehi, Jasem
    Sheshyekani, Keyhan
    Hamzeh, Mohsen
    Afjei, Ebrahim
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2015, 9 (10) : 895 - 902
  • [49] Accelerating single molecule localization microscopy through parallel processing on a high-performance computing cluster
    Munro, I.
    Garcia, E.
    Yan, M.
    Guldbrand, S.
    Kumar, S.
    Kwakwa, K.
    Dunsby, C.
    Neil, M. A. A.
    French, P. M. W.
    JOURNAL OF MICROSCOPY, 2019, 273 (02) : 148 - 160
  • [50] Recombination activity of grain boundaries in high-performance multicrystalline Si during solar cell processing
    Adamczyk, Krzysztof
    Sondena, Rune
    Stokkan, Gaute
    Looney, Erin
    Jensen, Mallory
    Lai, Barry
    Rinio, Markus
    Di Sabatino, Marisa
    JOURNAL OF APPLIED PHYSICS, 2018, 123 (05)