GenStore: A High-Performance In-Storage Processing System for Genome Sequence Analysis

被引:35
|
作者
Ghiasi, Nika Mansouri [1 ]
Park, Jisung [1 ]
Mustafa, Harun [1 ]
Kim, Jeremie [1 ]
Olgun, Ataberk [1 ]
Gollwitzer, Arvid [1 ]
Cali, Damla Senol [2 ]
Firtina, Can [1 ]
Mao, Haiyu [1 ]
Alserr, Nour Almadhoun [1 ]
Ausavarungnirun, Rachata [3 ]
Vijaykumar, Nandita [4 ]
Alser, Mohammed [1 ]
Mutlu, Onur [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Bionano Genom, San Diego, CA USA
[3] KMUTNB, Bangkok, Thailand
[4] Univ Toronto, Toronto, ON, Canada
来源
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS | 2022年
基金
新加坡国家研究基金会;
关键词
Read Mapping; Filtering; Genomics; Storage; Near-Data Processing; GENERATION; ALGORITHM; ALIGNMENT; SEARCH; MODEL;
D O I
10.1145/3503222.3507702
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Read mapping is a fundamental step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). Read mapping is costly because it needs to perform approximate string matching (ASM) on large amounts of data. To address the computational challenges in genome analysis, many prior works propose various approaches such as accurate filters that select the reads within a dataset of genomic reads (called a read set) that must undergo expensive computation, efficient heuristics, and hardware acceleration. While effective at reducing the amount of expensive computation, all such approaches still require the costly movement of a large amount of data from storage to the rest of the system, which can significantly lower the end-to-end performance of read mapping in conventional and emerging genomics systems. We propose GenStore, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. GenStore leverages hardware/software co-design to address the challenges of in-storage processing, supporting reads with 1) different properties such as read lengths and error rates, which highly depend on the sequencing technology, and 2) different degrees of genetic variation compared to the reference genome, which highly depends on the genomes that are being compared. Through rigorous analysis of read mapping processes of reads with different properties and degrees of genetic variation, we meticulously design low-cost hardware accelerators and data/computation flows inside a NAND flashbased solid-state drive (SSD). Our evaluation using a wide range of real genomic datasets shows that GenStore, when implemented in three modern NAND flash-based SSDs, significantly improves the read mapping performance of state-of-the-art software (hardware) baselines by 2.07-6.05x (1.52-3.32x) for read sets with high similarity to the reference genome and 1.45-33.63x (2.70-19.2x) for read sets with low similarity to the reference genome.
引用
收藏
页码:635 / 654
页数:20
相关论文
共 50 条
  • [1] GenStore: In-Storage Filtering of Genomic Data for High-Performance and Energy-Efficient Genome Analysis
    Ghiasi, Nika Mansouri
    Park, Jisung
    Mustafa, Harun
    Kim, Jeremie
    Olgun, Ataberk
    Gollwitzer, Arvid
    Cali, Damla Senol
    Firtina, Can
    Mao, Haiyu
    Alserr, Nour Almadhoun
    Ausavarungnirun, Rachata
    Vijaykumar, Nandita
    Alser, Mohammed
    Mutlu, Onur
    2022 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2022), 2022, : 283 - 287
  • [2] GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis
    Cali, Damla Senol
    Kalsi, Gurpreet S.
    Bingol, Zulal
    Firtina, Can
    Subramanian, Lavanya
    Kim, Jeremie S.
    Ausavarungnirun, Rachata
    Alser, Mohammed
    Gomez-Luna, Juan
    Boroumand, Amirali
    Nori, Anant
    Scibisz, Allison
    Subramoney, Sreenivas
    Alkan, Can
    Ghose, Saugata
    Mutlu, Onur
    2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 951 - 966
  • [3] Leveraging Difference Recurrence Relations for High-Performance GPU Genome Alignment
    Zeni, Alberto
    Onken, Seth
    Santambrogio, Marco Domenico
    Samadi, Mehrzad
    PROCEEDINGS OF THE 2024 THE INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2024, 2024, : 133 - 143
  • [4] A survey of genome sequence assembly techniques and algorithms using high-performance computing
    Ahmed, Munib
    Ahmad, Ishfaq
    Ahmad, Mohammad Saad
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (01) : 293 - 339
  • [5] A High-Performance Heterogeneous Computing Platform for Biological Sequence Analysis
    Meng, Xiandong
    Chaudhary, Vipin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2010, 21 (09) : 1267 - 1280
  • [6] High-performance genome sorting program
    Kasilov, Vasily
    Drobintsev, Pavel
    Voinov, Nikita
    10TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE IN COMPUTATIONAL SCIENCE (YSC2021), 2021, 193 : 464 - 473
  • [7] In-depth analysis on parallel processing patterns for high-performance Dataframes
    Perera, Niranda
    Sarker, Arup Kumar
    Staylor, Mills
    von Laszewski, Gregor
    Shan, Kaiying
    Kamburugamuve, Supun
    Widanage, Chathura
    Abeykoon, Vibhatha
    Kanewela, Thejaka Amila
    Fox, Geoffrey
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 149 : 250 - 264
  • [8] RAID - HIGH-PERFORMANCE, RELIABLE SECONDARY STORAGE
    CHEN, PM
    LEE, EK
    GIBSON, GA
    KATZ, RH
    PATTERSON, DA
    ACM COMPUTING SURVEYS, 1994, 26 (02) : 145 - 185
  • [9] High-Performance Computing on Power System Transient Stability Analysis: A Review
    Wang, Cong
    Liang, Shiyang
    Jia, Xun
    Jin, Shuangshuang
    2023 NORTH AMERICAN POWER SYMPOSIUM, NAPS, 2023,
  • [10] Efficient security interface for high-performance Ceph storage systems
    Parast, Fatemeh Khoda
    Damghani, Seyed Alireza
    Kelly, Brett
    Wang, Yang
    Kent, Kenneth B.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 164