GenStore: A High-Performance In-Storage Processing System for Genome Sequence Analysis

被引:35
|
作者
Ghiasi, Nika Mansouri [1 ]
Park, Jisung [1 ]
Mustafa, Harun [1 ]
Kim, Jeremie [1 ]
Olgun, Ataberk [1 ]
Gollwitzer, Arvid [1 ]
Cali, Damla Senol [2 ]
Firtina, Can [1 ]
Mao, Haiyu [1 ]
Alserr, Nour Almadhoun [1 ]
Ausavarungnirun, Rachata [3 ]
Vijaykumar, Nandita [4 ]
Alser, Mohammed [1 ]
Mutlu, Onur [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Bionano Genom, San Diego, CA USA
[3] KMUTNB, Bangkok, Thailand
[4] Univ Toronto, Toronto, ON, Canada
来源
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS | 2022年
基金
新加坡国家研究基金会;
关键词
Read Mapping; Filtering; Genomics; Storage; Near-Data Processing; GENERATION; ALGORITHM; ALIGNMENT; SEARCH; MODEL;
D O I
10.1145/3503222.3507702
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Read mapping is a fundamental step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). Read mapping is costly because it needs to perform approximate string matching (ASM) on large amounts of data. To address the computational challenges in genome analysis, many prior works propose various approaches such as accurate filters that select the reads within a dataset of genomic reads (called a read set) that must undergo expensive computation, efficient heuristics, and hardware acceleration. While effective at reducing the amount of expensive computation, all such approaches still require the costly movement of a large amount of data from storage to the rest of the system, which can significantly lower the end-to-end performance of read mapping in conventional and emerging genomics systems. We propose GenStore, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. GenStore leverages hardware/software co-design to address the challenges of in-storage processing, supporting reads with 1) different properties such as read lengths and error rates, which highly depend on the sequencing technology, and 2) different degrees of genetic variation compared to the reference genome, which highly depends on the genomes that are being compared. Through rigorous analysis of read mapping processes of reads with different properties and degrees of genetic variation, we meticulously design low-cost hardware accelerators and data/computation flows inside a NAND flashbased solid-state drive (SSD). Our evaluation using a wide range of real genomic datasets shows that GenStore, when implemented in three modern NAND flash-based SSDs, significantly improves the read mapping performance of state-of-the-art software (hardware) baselines by 2.07-6.05x (1.52-3.32x) for read sets with high similarity to the reference genome and 1.45-33.63x (2.70-19.2x) for read sets with low similarity to the reference genome.
引用
收藏
页码:635 / 654
页数:20
相关论文
共 50 条
  • [21] Tailoring Porous Transition Metal Oxide for High-Performance Lithium Storage
    Sun, Baoyu
    Zheng, Wei
    Yin, Xucai
    Chen, Xin
    Kong, Fanpeng
    Lou, Shuaifeng
    Du, Chunyu
    Zuo, Pengjian
    Xie, Jingying
    Wang, Jiajun
    Yin, Geping
    JOURNAL OF PHYSICAL CHEMISTRY C, 2021, 125 (41) : 22435 - 22445
  • [22] Efficient GPU Cloud architectures for outsourcing high-performance processing to the Cloud
    Sanchez-Ribes, Victor
    Macia-Lillo, Antonio
    Mora, Higinio
    Jimeno-Morenilla, Antonio
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2024, 133 (1-2) : 949 - 958
  • [23] A high performance online storage system for the LHCb experiment
    Cherukuwada, Sai Suman
    Netifeld, Niko
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2008, 55 (01) : 278 - 283
  • [24] HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis
    Santana-Quintero, Luis
    Dingerdissen, Hayley
    Thierry-Mieg, Jean
    Mazumder, Raja
    Simonyan, Vahan
    PLOS ONE, 2014, 9 (06):
  • [25] High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis
    Simonyan, Vahan
    Chumakov, Konstantin
    Dingerdissen, Hayley
    Faison, William
    Goldweber, Scott
    Golikov, Anton
    Gulzar, Naila
    Karagiannis, Konstantinos
    Phuc Vinh Nguyen Lam
    Maudru, Thomas
    Muravitskaja, Olesja
    Osipova, Ekaterina
    Pan, Yang
    Pschenichnov, Alexey
    Rostovtsev, Alexandre
    Santana-Quintero, Luis
    Smith, Krista
    Thompson, Elaine E.
    Tkachenko, Valery
    Torcivia-Rodriguez, John
    Voskanian, Alin
    Wan, Quan
    Wang, Jing
    Wu, Tsung-Jung
    Wilson, Carolyn
    Mazumder, Raja
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [26] High-performance peptide identification by tandem mass spectrometry allows reliable automatic data processing in proteomics
    Colinge, J
    Masselot, A
    Cusin, I
    Mahé, E
    Niknejad, A
    Argoud-Puy, G
    Reffas, S
    Bederr, N
    Gleizes, A
    Rey, PA
    Bougueleret, L
    PROTEOMICS, 2004, 4 (07) : 1977 - 1984
  • [27] Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons
    Besta, Maciej
    Kanakagiri, Raghavendra
    Mustafa, Harun
    Karasikov, Mikhail
    Raetsch, Gunnar
    Hoefler, Torsten
    Solomonik, Edgar
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 1122 - 1132
  • [28] Granular porous calcium carbonate particles for scalable and high-performance solar-driven thermochemical heat storage
    Song, Chao
    Liu, XiangLei
    Xuan, YiMin
    Zheng, HangBin
    Gao, Ke
    Teng, Liang
    Da, Yun
    Li, Chuan
    Li, YongLiang
    Ding, YuLong
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2021, 64 (10) : 2142 - 2152
  • [29] High-performance Supercomputing as a Risk Evaluation Tool for Geologic Carbon Dioxide Storage
    Yamamoto, Hajime
    Nanai, Shinichi
    Zhang, Keni
    Audigane, Pascal
    Chiaberge, Christophe
    Ogata, Ryusei
    Nishikawa, Noriaki
    Hirokawa, Yuichi
    Shingu, Satoru
    Nakajima, Kengo
    GHGT-11, 2013, 37 : 3997 - 4005
  • [30] PERFORMANCE ANALYSIS OF A HYBRID SOLAR ENERGY STORAGE SYSTEM
    Mohamadi, Zaeem Moosavi
    Zohoor, Hassan
    Assadi, Morteza Khalaji
    Hamidi, Ali A.
    JOURNAL OF MECHANICS, 2011, 27 (02) : N19 - N23