ICE: An Intelligent Cognition Engine with 3D NAND-based In-Memory Computing for Vector Similarity Search Acceleration

被引:10
作者
Hu, Han-Wen [1 ,2 ,3 ]
Wang, Wei-Chen [4 ]
Chang, Yuan-Hao [5 ]
Lee, Yung-Chun [1 ]
Lin, Bo-Rong [1 ]
Wang, Huai -Mu [1 ]
Lin, Yen-Po [1 ]
Huang, Yu -Ming [1 ]
Lee, Chong-Ying [1 ]
Su, Tzu-Hsiang [1 ]
Hsieh, Chih-Chang [1 ]
Hu, Chia -Ming [1 ]
Lai, Yi-Ting [1 ]
Chen, Chung-Kuang [1 ]
Chen, Han -Sung [1 ]
Li, Hsiang -Pang [1 ]
Kuo, Tei-Wei [4 ,6 ,7 ,8 ]
Chang, Meng -Fan [2 ,3 ]
Wang, Keh-Chung [1 ]
Hung, Chun-Hsiung [1 ]
Lu, Chih-Yuan [1 ]
机构
[1] Macronix Int Co Ltd, Hsinchu, Taiwan
[2] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu, Taiwan
[3] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
[4] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, New Taipei, Taiwan
[5] Acad Sinica, IInstitute Informat Sci, New Taipei, Taiwan
[6] Natl Taiwan Univ, Grad Inst Elect Engn, New Taipei, Taiwan
[7] Natl Taiwan Univ, Grad Inst Networking & Multimedia, New Taipei, Taiwan
[8] Natl Taiwan Univ, High Performance & Sci Comp Ctr, New Taipei, Taiwan
来源
2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO) | 2022年
关键词
3D NAND; In-Memory Computing; Vector Similarity Search; Unstructured Data Search; PARITY-CHECK CODES; EUCLIDEAN DISTANCE; MACRO; FLASH;
D O I
10.1109/MICRO56248.2022.00058
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Vector similarity search (VSS) for unstructured vectors generated via machine learning methods is a promising solution for many applications, such as face search. With increasing awareness and concern about data security requirements, there is a compelling need to store data and process VSS applications locally on edge devices rather than send data to servers for computation. However, the explosive amount of data movement from NAND storage to DRAM across memory hierarchy and data processing of the entire dataset consume enormous energy and require long latency for VSS applications. Specifically, edge devices with insufficient DRAM capacity will trigger data swap and deteriorate the execution performance. To overcome this crucial hurdle, we propose an intelligent cognition engine (ICE) with cognitive 3D NAND, featuring non-volatile in-memory computing (nvIMC) to accelerate the processing, suppress the data movement, and reduce data swap between the processor and storage. This cognitive 3D NAND features digital nvIMC techniques (i.e., ADC/DAC-free approach), high-density 3D NAND, and compatibility with standard 3D NAND products with minor modifications. To facilitate parallel INT8/INT4 vector-vector multiplication (VVM) and mitigate the reliability issue of 3D NAND, we develop a bit-error-tolerance data encoding and a two's complement-based digital accumulator. VVM can support similarity computations (e.g., cosine similarity and Euclidean distance), which are required to search "the most similar data" right where they are stored. In addition, the proposed solution can be realized on edge storage products, e.g., embedded MultiMedia Card (eMMC). The measured and simulated results on real 3D NAND chips show that ICE enhances the system execution time by 17 x to 95 x and energy efficiency by 11 x to 140 x, compared to traditional von Neumann approaches using state-of-the-art edge systems with MobileFaceNet on CASIA-WebFace dataset. To the best of our knowledge, this work demonstrates the first 3D NAND-based digital nvIMC technique with measured silicon data.
引用
收藏
页码:763 / 783
页数:21
相关论文
共 103 条
[1]  
A. Corporation, 2002, HSPICE US GUID SIM A
[2]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[3]   Compute Caches [J].
Aga, Shaizeen ;
Jeloka, Supreet ;
Subramaniyan, Arun ;
Narayanasamy, Satish ;
Blaauw, David ;
Das, Reetuparna .
2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, :481-492
[4]   A closed-form expression for Write Amplification in NAND Flash [J].
Agarwal, Rajiv ;
Marrow, Marcus .
2010 IEEE GLOBECOM WORKSHOPS, 2010, :1846-1850
[5]  
Anand P., 2019, ARXIV
[6]  
Angizi S., 2020, ACMIEEE DESIGN AUTOM, P1
[7]  
[Anonymous], RASPB PI 4
[8]  
[Anonymous], 2015, 2015 IEEE INT ELECT, DOI DOI 10.1109/IEDM.2015.7409648
[9]  
[Anonymous], RASPBERRY PI 3
[10]  
[Anonymous], Dell