An Efficient Near-Bank Processing Architecture for Personalized Recommendation System

被引:2
作者
Yang, Yuqing [1 ]
Yang, Weidong [1 ]
Wang, Qin [1 ]
Jing, Naifeng [1 ]
Jiang, Jianfei [1 ]
Mao, Zhigang [1 ]
Sheng, Weiguang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Micronano Elect, Shanghai, Peoples R China
来源
2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC | 2023年
基金
国家重点研发计划;
关键词
Recommendation system; Near-memory processing; Mapping scheme; HMC;
D O I
10.1145/3566097.3567857
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Personalized recommendation systems consume the major resources in modern AI data centers. The memory-bound embedding layers with irregular memory access patterns have been identified as the bottleneck of recommendation systems. To overcome the memory challenges, near-memory processing (NMP) would be an effective solution which provides high bandwidth. Recent work proposes an NMP approach to accelerate the recommendation models by utilizing the through-silicon via (TSV) bandwidth in 3D-stacked DRAMs. However, the total bandwidth provided by TSVs is insufficient for a batch of embedding layers processed in parallel. In this paper, we propose a near-bank processing architecture to accelerate recommendation models. By integrating the compute-logic near memory banks on DRAM dies of the 3D-stacked DRAM, our architecture can exploit the enormous bank-level bandwidth which is much higher than TSV bandwidth. We also present a hardware/software interface for embedding layers offloading. Moreover, we propose an efficient mapping scheme to enhance the utilization of bank-level bandwidth. As a result, our architecture achieves up to 2.10x speedup and 31% energy saving for data movement over the state-of-the-art NMP solution for recommendation acceleration based on 3D-stacked memory.
引用
收藏
页码:122 / 127
页数:6
相关论文
共 17 条
  • [1] Co-ML: A Case for Collaborative ML Acceleration using Near-data Processing
    Aga, Shaizeen
    Jayasena, Nuwan
    Ignatowski, Mike
    [J]. MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2019, : 506 - 517
  • [2] A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing
    Ahn, Junwhan
    Hong, Sungpack
    Yoo, Sungjoo
    Mutlu, Onur
    Choi, Kiyoung
    [J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 105 - 117
  • [3] PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture
    Ahn, Junwhan
    Yoo, Sungjoo
    Mutlu, Onur
    Choi, Kiyoung
    [J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 336 - 348
  • [4] [Anonymous], 2014, Hybrid Memory Cube Specification 2.1
  • [5] Chen K, 2012, DES AUT TEST EUROPE, P33
  • [6] Davidson James, 2010, P 4 ACM C REC SYST, P293, DOI DOI 10.1145/1864708.1864770
  • [7] iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture
    Gu, Peng
    Xie, Xinfeng
    Ding, Yufei
    Chen, Guoyang
    Zhang, Weifeng
    Niu, Dimin
    Xie, Yuan
    [J]. 2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, : 804 - 817
  • [8] The Architectural Implications of Facebook's DNN-based Personalized Recommendation
    Gupta, Udit
    Wu, Carole-Jean
    Wang, Xiaodong
    Naumov, Maxim
    Reagen, Brandon
    Brooks, David
    Cottel, Bradford
    Hazelwood, Kim
    Hempstead, Mark
    Jia, Bill
    Lee, Hsien-Hsin S.
    Malevich, Andrey
    Mudigere, Dheevatsa
    Smelyanskiy, Mikhail
    Xiong, Liang
    Zhang, Xuan
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 488 - 501
  • [9] SPACE: Locality-Aware Processing in Heterogeneous Memory for Personalized Recommendations
    Kal, Hongju
    Lee, Seokmin
    Ko, Gun
    Ro, Won Woo
    [J]. 2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 679 - 691
  • [10] RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing
    Ke, Liu
    Gupta, Udit
    Cho, Benjamin Youngjae
    Brooks, David
    Chandra, Vikas
    Diril, Utku
    Firoozshahian, Amin
    Hazelwood, Kim
    Jia, Bill
    Lee, Hsien-Hsin S.
    Li, Meng
    Maher, Bert
    Mudigere, Dheevatsa
    Naumov, Maxim
    Schatz, Martin
    Smelyanskiy, Mikhail
    Wang, Xiaodong
    Reagen, Brandon
    Wu, Carole-Jean
    Hempstead, Mark
    Zhang, Xuan
    [J]. 2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, : 790 - 803