ICE: An Intelligent Cognition Engine with 3D NAND-based In-Memory Computing for Vector Similarity Search Acceleration

被引:10
作者
Hu, Han-Wen [1 ,2 ,3 ]
Wang, Wei-Chen [4 ]
Chang, Yuan-Hao [5 ]
Lee, Yung-Chun [1 ]
Lin, Bo-Rong [1 ]
Wang, Huai -Mu [1 ]
Lin, Yen-Po [1 ]
Huang, Yu -Ming [1 ]
Lee, Chong-Ying [1 ]
Su, Tzu-Hsiang [1 ]
Hsieh, Chih-Chang [1 ]
Hu, Chia -Ming [1 ]
Lai, Yi-Ting [1 ]
Chen, Chung-Kuang [1 ]
Chen, Han -Sung [1 ]
Li, Hsiang -Pang [1 ]
Kuo, Tei-Wei [4 ,6 ,7 ,8 ]
Chang, Meng -Fan [2 ,3 ]
Wang, Keh-Chung [1 ]
Hung, Chun-Hsiung [1 ]
Lu, Chih-Yuan [1 ]
机构
[1] Macronix Int Co Ltd, Hsinchu, Taiwan
[2] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu, Taiwan
[3] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
[4] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, New Taipei, Taiwan
[5] Acad Sinica, IInstitute Informat Sci, New Taipei, Taiwan
[6] Natl Taiwan Univ, Grad Inst Elect Engn, New Taipei, Taiwan
[7] Natl Taiwan Univ, Grad Inst Networking & Multimedia, New Taipei, Taiwan
[8] Natl Taiwan Univ, High Performance & Sci Comp Ctr, New Taipei, Taiwan
来源
2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO) | 2022年
关键词
3D NAND; In-Memory Computing; Vector Similarity Search; Unstructured Data Search; PARITY-CHECK CODES; EUCLIDEAN DISTANCE; MACRO; FLASH;
D O I
10.1109/MICRO56248.2022.00058
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Vector similarity search (VSS) for unstructured vectors generated via machine learning methods is a promising solution for many applications, such as face search. With increasing awareness and concern about data security requirements, there is a compelling need to store data and process VSS applications locally on edge devices rather than send data to servers for computation. However, the explosive amount of data movement from NAND storage to DRAM across memory hierarchy and data processing of the entire dataset consume enormous energy and require long latency for VSS applications. Specifically, edge devices with insufficient DRAM capacity will trigger data swap and deteriorate the execution performance. To overcome this crucial hurdle, we propose an intelligent cognition engine (ICE) with cognitive 3D NAND, featuring non-volatile in-memory computing (nvIMC) to accelerate the processing, suppress the data movement, and reduce data swap between the processor and storage. This cognitive 3D NAND features digital nvIMC techniques (i.e., ADC/DAC-free approach), high-density 3D NAND, and compatibility with standard 3D NAND products with minor modifications. To facilitate parallel INT8/INT4 vector-vector multiplication (VVM) and mitigate the reliability issue of 3D NAND, we develop a bit-error-tolerance data encoding and a two's complement-based digital accumulator. VVM can support similarity computations (e.g., cosine similarity and Euclidean distance), which are required to search "the most similar data" right where they are stored. In addition, the proposed solution can be realized on edge storage products, e.g., embedded MultiMedia Card (eMMC). The measured and simulated results on real 3D NAND chips show that ICE enhances the system execution time by 17 x to 95 x and energy efficiency by 11 x to 140 x, compared to traditional von Neumann approaches using state-of-the-art edge systems with MobileFaceNet on CASIA-WebFace dataset. To the best of our knowledge, this work demonstrates the first 3D NAND-based digital nvIMC technique with measured silicon data.
引用
收藏
页码:763 / 783
页数:21
相关论文
共 103 条
[81]   ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars [J].
Shafiee, Ali ;
Nag, Anirban ;
Muralimanohar, Naveen ;
Balasubramonian, Rajeev ;
Strachan, John Paul ;
Hu, Miao ;
Williams, R. Stanley ;
Srikumar, Vivek .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :14-26
[82]  
Shibata N, 2019, ISSCC DIG TECH PAP I, V62, P210, DOI 10.1109/ISSCC.2019.8662443
[83]  
Si X, 2020, ISSCC DIG TECH PAP I, P246, DOI [10.1109/isscc19947.2020.9062995, 10.1109/ISSCC19947.2020.9062995]
[84]  
Si X, 2019, ISSCC DIG TECH PAP I, V62, P396, DOI 10.1109/ISSCC.2019.8662392
[85]  
Simhadri H. V., 2022, ARXIV
[86]   Deep Metric Learning via Lifted Structured Feature Embedding [J].
Song, Hyun Oh ;
Xiang, Yu ;
Jegelka, Stefanie ;
Savarese, Silvio .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4004-4012
[87]   15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips [J].
Su, Jian-Wei ;
Si, Xin ;
Chou, Yen-Chi ;
Chang, Ting-Wei ;
Huang, Wei-Hsing ;
Tu, Yung-Ning ;
Liu, Ruhui ;
Lu, Pei-Jung ;
Liu, Ta-Wei ;
Wang, Jing-Hong ;
Zhang, Zhixiao ;
Jiang, Hongwu ;
Huang, Shanshi ;
Lo, Chung-Chuan ;
Liu, Ren-Shuo ;
Hsieh, Chih-Cheng ;
Tang, Kea-Tiong ;
Sheu, Shyh-Shyuan ;
Li, Sih-Han ;
Lee, Heng-Yuan ;
Chang, Shih-Chieh ;
Yu, Shimeng ;
Chang, Meng-Fan .
2020 IEEE INTERNATIONAL SOLID- STATE CIRCUITS CONFERENCE (ISSCC), 2020, :240-+
[88]  
Sztaho D., 2019, ARXIV
[89]  
Tieu K, 2000, PROC CVPR IEEE, P228, DOI 10.1109/CVPR.2000.855824
[90]   Not Just Privacy: Improving Performance of Private Deep Learning in Mobile Cloud [J].
Wang, Ji ;
Zhang, Jianguo ;
Bao, Weidong ;
Zhu, Xiaomin ;
Cao, Bokai ;
Yu, Philip S. .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :2407-2416