Flips: A Flexible Partitioning Strategy Near Memory Processing Architecture for Recommendation System

被引:0
作者
Qiu, Yudi [1 ]
Lu, Lingfei [2 ]
Yi, Shiyan [2 ]
Jing, Minge [2 ]
Zeng, Xiaoyang [2 ]
Kong, Yang [1 ]
Fan, Yibo [2 ]
机构
[1] Alibaba Cloud Comp, Hangzhou 200240, Peoples R China
[2] Fudan Univ, State Key Lab Integrated Chips & Syst, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Vectors; Recommender systems; Data centers; Parallel processing; Bandwidth; Production; Memory management; Memory architecture; Hardware; Social networking (online); Memory system; near memory processing; recommendation system; ACCELERATOR;
D O I
10.1109/TPDS.2025.3539534
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Personalized recommendation systems are massively deployed in production data centers. The memory-intensive embedding layers of recommendation systems are the crucial performance bottleneck, with operations manifesting as sparse memory lookups and simple reduction computations. Recent studies propose near-memory processing (NMP) architectures to speed up embedding operations by utilizing high internal memory bandwidth. However, these solutions typically employ a fixed vector partitioning strategy that fail to adapt to changes in data center deployment scenarios and lack practicality. We propose Flips, a flexible partitioning strategy NMP architecture that accelerates embedding layers. Flips supports more than ten partitioning strategies through hardware-software co-design. Novel hardware architectures and address mapping schemes are designed for the memory-side and host-side. We provide two approaches to determine the optimal partitioning strategy for each embedding table, enabling the architecture to accommodate changes in deployment scenarios. Importantly, Flips is decoupled from the NMP level and can utilize rank-level, bank-group-level and bank-level parallelism. In peer-level NMP evaluations, Flips outperforms state-of-the-art NMP solutions, RecNMP, TRiM, and ReCross by up to 4.0x, 4.1x, and 3.5x, respectively.
引用
收藏
页码:745 / 758
页数:14
相关论文
共 11 条
  • [1] A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System
    Lu, Lingfei
    Qiu, Yudi
    Yi, Shiyan
    Fan, Yibo
    IEEE COMPUTER ARCHITECTURE LETTERS, 2023, 22 (02) : 165 - 168
  • [2] RecPIM: Efficient In-Memory Processing for Personalized Recommendation Inference Using Near-Bank Architecture
    Yang, Weidong
    Yang, Yuqing
    Ji, Shuya
    Jiang, Jianfei
    Jing, Naifeng
    Wang, Qin
    Mao, Zhigang
    Sheng, Weiguang
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (10) : 2854 - 2867
  • [3] An Efficient Near-Bank Processing Architecture for Personalized Recommendation System
    Yang, Yuqing
    Yang, Weidong
    Wang, Qin
    Jing, Naifeng
    Jiang, Jianfei
    Mao, Zhigang
    Sheng, Weiguang
    2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC, 2023, : 122 - 127
  • [4] DRAMA: An Architecture for Accelerated Processing Near Memory
    Farmahini-Farahani, Amin
    Ahn, Jung Ho
    Morrow, Katherine
    Kim, Nam Sung
    IEEE COMPUTER ARCHITECTURE LETTERS, 2015, 14 (01) : 26 - 29
  • [5] Accelerating Personalized Recommendation with Cross-level Near-Memory Processing
    Liu, Haifeng
    Zheng, Long
    Huang, Yu
    Liu, Chaoqiang
    Ye, Xiangyu
    Yuan, Jingrui
    Liao, Xiaofei
    Jin, Hai
    Xue, Jingling
    PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023, 2023, : 924 - 936
  • [6] Charon: Specialized Near-Memory Processing Architecture for Clearing Dead Objects in Memory
    Jang, Jaeyoung
    Heo, Jun
    Lee, Yejin
    Won, Jaeyeon
    Kim, Seonghak
    Jung, Sung Jun
    Hakbeom, Jang
    Ham, Tae Jun
    Lee, Jae Woo
    MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 726 - 739
  • [7] TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
    Kwon, Youngeun
    Lee, Yunjae
    Rhu, Minsoo
    MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 740 - 753
  • [8] Data Locality Aware Computation Offloading in Near Memory Processing Architecture for Big Data Applications
    Maity, Satanu
    Goel, Mayank
    Ghose, Manojit
    2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 288 - 297
  • [9] iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture
    Gu, Peng
    Xie, Xinfeng
    Ding, Yufei
    Chen, Guoyang
    Zhang, Weifeng
    Niu, Dimin
    Xie, Yuan
    2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, : 804 - 817
  • [10] NDRec: A Near-Data Processing System for Training Large-Scale Recommendation Models
    Li, Shiyu
    Wang, Yitu
    Hanson, Edward
    Chang, Andrew
    Ki, Yang Seok
    Li, Hai
    Chen, Yiran
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (05) : 1248 - 1261