GenStore: A High-Performance In-Storage Processing System for Genome Sequence Analysis

被引:35
|
作者
Ghiasi, Nika Mansouri [1 ]
Park, Jisung [1 ]
Mustafa, Harun [1 ]
Kim, Jeremie [1 ]
Olgun, Ataberk [1 ]
Gollwitzer, Arvid [1 ]
Cali, Damla Senol [2 ]
Firtina, Can [1 ]
Mao, Haiyu [1 ]
Alserr, Nour Almadhoun [1 ]
Ausavarungnirun, Rachata [3 ]
Vijaykumar, Nandita [4 ]
Alser, Mohammed [1 ]
Mutlu, Onur [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Bionano Genom, San Diego, CA USA
[3] KMUTNB, Bangkok, Thailand
[4] Univ Toronto, Toronto, ON, Canada
来源
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS | 2022年
基金
新加坡国家研究基金会;
关键词
Read Mapping; Filtering; Genomics; Storage; Near-Data Processing; GENERATION; ALGORITHM; ALIGNMENT; SEARCH; MODEL;
D O I
10.1145/3503222.3507702
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Read mapping is a fundamental step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). Read mapping is costly because it needs to perform approximate string matching (ASM) on large amounts of data. To address the computational challenges in genome analysis, many prior works propose various approaches such as accurate filters that select the reads within a dataset of genomic reads (called a read set) that must undergo expensive computation, efficient heuristics, and hardware acceleration. While effective at reducing the amount of expensive computation, all such approaches still require the costly movement of a large amount of data from storage to the rest of the system, which can significantly lower the end-to-end performance of read mapping in conventional and emerging genomics systems. We propose GenStore, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. GenStore leverages hardware/software co-design to address the challenges of in-storage processing, supporting reads with 1) different properties such as read lengths and error rates, which highly depend on the sequencing technology, and 2) different degrees of genetic variation compared to the reference genome, which highly depends on the genomes that are being compared. Through rigorous analysis of read mapping processes of reads with different properties and degrees of genetic variation, we meticulously design low-cost hardware accelerators and data/computation flows inside a NAND flashbased solid-state drive (SSD). Our evaluation using a wide range of real genomic datasets shows that GenStore, when implemented in three modern NAND flash-based SSDs, significantly improves the read mapping performance of state-of-the-art software (hardware) baselines by 2.07-6.05x (1.52-3.32x) for read sets with high similarity to the reference genome and 1.45-33.63x (2.70-19.2x) for read sets with low similarity to the reference genome.
引用
收藏
页码:635 / 654
页数:20
相关论文
共 50 条
  • [31] The role of high-performance work system and human capital in enhancing job performance
    Imran, Rabia
    Atiya, Tariq Mohammed Salih
    WORLD JOURNAL OF ENTREPRENEURSHIP MANAGEMENT AND SUSTAINABLE DEVELOPMENT, 2020, 16 (03) : 195 - 206
  • [32] High-Performance Computing in Meteorology under a Context of an Era of Graphical Processing Units
    Nakaegawa, Tosiyuki
    COMPUTERS, 2022, 11 (07)
  • [33] Adaptive Microphone Array Processing for High-Performance Speech Recognition in Car Environment
    Hong, Jungpyo
    Han, Seungho
    Jeong, Sangbae
    Hahn, Minsoo
    IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE 2011), 2011, : 829 - +
  • [34] RDMA-Based Apache Storm for High-Performance Stream Data Processing
    Zhang, Ziyu
    Liu, Zitan
    Jiang, Qingcai
    Chen, Junshi
    An, Hong
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2021, 49 (05) : 671 - 684
  • [35] Robust design optimization and stochastic performance analysis of a grid-connected photovoltaic system with battery storage and hydrogen storage
    Coppitters, Diederik
    De Paepe, Ward
    Contino, Francesco
    ENERGY, 2020, 213
  • [36] Career Impact of High-Performance Work System: A Kaleidoscope Perspective
    Phuong Tran Huy
    Ngan Vu Hoang
    PSYCHOLOGICAL REPORTS, 2025, 128 (02) : 1162 - 1186
  • [37] A High-Performance UWB Gaussian Pulse Generator: Analysis and Design
    Feghhi, Rouhollah
    Winter, Robert
    Rambabu, Karumudi
    IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, 2022, 70 (06) : 3257 - 3268
  • [38] Similarity spectra analysis of high-performance jet aircraft noise
    Neilsen, Tracianne B.
    Gee, Kent L.
    Wall, Alan T.
    James, Michael M.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 133 (04) : 2116 - 2125
  • [39] A Review of High-Performance Computing Methods for Power Flow Analysis
    Alawneh, Shadi G.
    Zeng, Lei
    Arefifar, Seyed Ali
    MATHEMATICS, 2023, 11 (11)
  • [40] High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies
    Suplatov, Dmitry
    Sharapova, Yana
    Shegay, Maxim
    Popova, Nina
    Fesko, Kateryna
    Voevodin, Vladimir
    Svedas, Vytas
    SUPERCOMPUTING (RUSCDAYS 2019), 2019, 1129 : 249 - 264