Flips: A Flexible Partitioning Strategy Near Memory Processing Architecture for Recommendation System

被引：0

作者：

Qiu, Yudi ^{[1
]}

Lu, Lingfei ^{[2
]}

Yi, Shiyan ^{[2
]}

Jing, Minge ^{[2
]}

Zeng, Xiaoyang ^{[2
]}

Kong, Yang ^{[1
]}

Fan, Yibo ^{[2
]}

机构：

[1] Alibaba Cloud Comp, Hangzhou 200240, Peoples R China

[2] Fudan Univ, State Key Lab Integrated Chips & Syst, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2025年 / 36卷 / 04期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Vectors; Recommender systems; Data centers; Parallel processing; Bandwidth; Production; Memory management; Memory architecture; Hardware; Social networking (online); Memory system; near memory processing; recommendation system; ACCELERATOR;

D O I：

10.1109/TPDS.2025.3539534

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Personalized recommendation systems are massively deployed in production data centers. The memory-intensive embedding layers of recommendation systems are the crucial performance bottleneck, with operations manifesting as sparse memory lookups and simple reduction computations. Recent studies propose near-memory processing (NMP) architectures to speed up embedding operations by utilizing high internal memory bandwidth. However, these solutions typically employ a fixed vector partitioning strategy that fail to adapt to changes in data center deployment scenarios and lack practicality. We propose Flips, a flexible partitioning strategy NMP architecture that accelerates embedding layers. Flips supports more than ten partitioning strategies through hardware-software co-design. Novel hardware architectures and address mapping schemes are designed for the memory-side and host-side. We provide two approaches to determine the optimal partitioning strategy for each embedding table, enabling the architecture to accommodate changes in deployment scenarios. Importantly, Flips is decoupled from the NMP level and can utilize rank-level, bank-group-level and bank-level parallelism. In peer-level NMP evaluations, Flips outperforms state-of-the-art NMP solutions, RecNMP, TRiM, and ReCross by up to 4.0x, 4.1x, and 3.5x, respectively.

引用

页码：745 / 758

页数：14

共 11 条

[1] A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System
Lu, Lingfei
Qiu, Yudi
Yi, Shiyan
Fan, Yibo
IEEE COMPUTER ARCHITECTURE LETTERS, 2023, 22 (02) : 165 - 168
[2] RecPIM: Efficient In-Memory Processing for Personalized Recommendation Inference Using Near-Bank Architecture
Yang, Weidong
Yang, Yuqing
Ji, Shuya
Jiang, Jianfei
Jing, Naifeng
Wang, Qin
Mao, Zhigang
Sheng, Weiguang
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (10) : 2854 - 2867
[3] An Efficient Near-Bank Processing Architecture for Personalized Recommendation System
Yang, Yuqing
Yang, Weidong
Wang, Qin
Jing, Naifeng
Jiang, Jianfei
Mao, Zhigang
Sheng, Weiguang
2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC, 2023, : 122 - 127
[4] DRAMA: An Architecture for Accelerated Processing Near Memory
Farmahini-Farahani, Amin
Ahn, Jung Ho
Morrow, Katherine
Kim, Nam Sung
IEEE COMPUTER ARCHITECTURE LETTERS, 2015, 14 (01) : 26 - 29
[5] Accelerating Personalized Recommendation with Cross-level Near-Memory Processing
Liu, Haifeng
Zheng, Long
Huang, Yu
Liu, Chaoqiang
Ye, Xiangyu
Yuan, Jingrui
Liao, Xiaofei
Jin, Hai
Xue, Jingling
PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023, 2023, : 924 - 936
[6] Charon: Specialized Near-Memory Processing Architecture for Clearing Dead Objects in Memory
Jang, Jaeyoung
Heo, Jun
Lee, Yejin
Won, Jaeyeon
Kim, Seonghak
Jung, Sung Jun
Hakbeom, Jang
Ham, Tae Jun
Lee, Jae Woo
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 726 - 739
[7] TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Kwon, Youngeun
Lee, Yunjae
Rhu, Minsoo
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 740 - 753
[8] Data Locality Aware Computation Offloading in Near Memory Processing Architecture for Big Data Applications
Maity, Satanu
Goel, Mayank
Ghose, Manojit
2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 288 - 297
[9] iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture
Gu, Peng
Xie, Xinfeng
Ding, Yufei
Chen, Guoyang
Zhang, Weifeng
Niu, Dimin
Xie, Yuan
2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, : 804 - 817
[10] NDRec: A Near-Data Processing System for Training Large-Scale Recommendation Models
Li, Shiyu
Wang, Yitu
Hanson, Edward
Chang, Andrew
Ki, Yang Seok
Li, Hai
Chen, Yiran
IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (05) : 1248 - 1261

← 1 2 →