ClaPIM: Scalable Sequence Classification Using Processing-in-Memory

被引:2
|
作者
Khalifa, Marcel [1 ]
Hoffer, Barak [1 ]
Leitersdorf, Orian [1 ]
Hanhan, Robert [1 ]
Perach, Ben [1 ]
Yavits, Leonid [2 ]
Kvatinsky, Shahar [1 ]
机构
[1] Technion Israel Inst Technol, Andrew & Erna Viterbi Fac Elect & Comp Engn, IL-3200003 Haifa, Israel
[2] Bar Ilan Univ, Alexander Kofkin Fac Engn, IL-5290002 Ramat Gan, Israel
基金
欧洲研究理事会;
关键词
Accelerator; approximate string matching; bioinformatics; deoxyribonucleic acid (DNA) classification; processing-in-memory (PIM);
D O I
10.1109/TVLSI.2023.3293038
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deoxyribonucleic acid (DNA) sequence classification is a fundamental task in computational biology with vast implications for applications such as disease prevention and drug design. Therefore, fast high-quality sequence classifiers are significantly important. This article introduces ClaPIM, a scalable DNA sequence classification architecture based on the emerging concept of hybrid in-crossbar and near-crossbar memristive processing-in-memory (PIM). We enable efficient and high-quality classification by uniting the filter and search stages within a single algorithm. Specifically, we propose a custom filtering technique that drastically narrows the search space and a search approach that facilitates approximate string matching through a distance function. ClaPIM is the first PIM architecture for scalable approximate string matching that benefits from the high density of memristive crossbar arrays and the massive computational parallelism of PIM. Compared with Kraken2, a state-of-the-art software classifier, ClaPIM provides significantly higher classification quality (up to 20x improvement in F1 score) and also demonstrates a 1.8x throughput improvement. Compared with edit distance tolerant approximate matching (EDAM), a recently proposed static random-access memory (SRAM)-based accelerator that is restricted to small datasets, we observe both a 30.4x improvement in normalized throughput per area and a 7% increase in classification precision.
引用
收藏
页码:1347 / 1357
页数:11
相关论文
共 38 条
  • [31] Identification and classification of promoters using the attention mechanism based on long short-term memory
    Li, Qingwen
    Zhang, Lichao
    Xu, Lei
    Zou, Quan
    Wu, Jin
    Li, Qingyuan
    FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (04)
  • [32] iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture
    Gu, Peng
    Xie, Xinfeng
    Ding, Yufei
    Chen, Guoyang
    Zhang, Weifeng
    Niu, Dimin
    Xie, Yuan
    2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, : 804 - 817
  • [33] Improving the performance of Smith–Waterman sequence algorithm on GPU using shared memory for biological protein sequences
    D. Venkata Vara Prasad
    Suresh Jaganathan
    Cluster Computing, 2019, 22 : 9495 - 9504
  • [34] novel feature selection based on apriori property and correlation analysis for protein sequence classification using MapReduce
    Bhavani, R.
    Sadasivam, G. Sudha
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 17 (03) : 255 - 265
  • [35] RecPIM: Efficient In-Memory Processing for Personalized Recommendation Inference Using Near-Bank Architecture
    Yang, Weidong
    Yang, Yuqing
    Ji, Shuya
    Jiang, Jianfei
    Jing, Naifeng
    Wang, Qin
    Mao, Zhigang
    Sheng, Weiguang
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (10) : 2854 - 2867
  • [36] Improving the performance of Smith-Waterman sequence algorithm on GPU using shared memory for biological protein sequences
    Prasad, D. Venkata Vara
    Jaganathan, Suresh
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 4): : S9495 - S9504
  • [37] Gene Coverage Count and Classification (GC3): a locus sequence coverage assessment tool using short-read whole genome sequencing data, and its application to identify and classify histidine-rich protein 2 and 3 deletions in Plasmodium falciparum
    Thomas C. Stabler
    Ankit Dwivedi
    Biraj Shrestha
    Sudhaunshu Joshi
    Tobias Schindler
    Amed Ouattara
    Guillermo A. García
    Claudia Daubenberger
    Joana C. Silva
    Malaria Journal, 21
  • [38] Gene Coverage Count and Classification (GC3): a locus sequence coverage assessment tool using short-read whole genome sequencing data, and its application to identify and classify histidine-rich protein 2 and 3 deletions in Plasmodium falciparum
    Stabler, Thomas C.
    Dwivedi, Ankit
    Shrestha, Biraj
    Joshi, Sudhaunshu
    Schindler, Tobias
    Ouattara, Amed
    Garcia, Guillermo A.
    Daubenberger, Claudia
    Silva, Joana C.
    MALARIA JOURNAL, 2022, 21 (01)