Area-Efficient Distributed Arithmetic Optimization via Heuristic Decomposition and In-Memroy Computing

被引:4
作者
Chen, Jian [1 ]
Zhao, Wenfeng [2 ]
Ha, Yajun [1 ]
机构
[1] Shanghaitech Univ, Sch Informat & Sci Technol, Shanghai, Peoples R China
[2] Univ Minnesota, Dept Biomed Engn, Minneapolis, MN USA
来源
2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ASIC (ASICON) | 2019年
关键词
SRAM; distributed arithmetic; in-memory computing; FIR;
D O I
10.1109/asicon47005.2019.8983659
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Distributed arithmetic (DA) is popularly adopted in many digital signal processing (DSP) applications, such as filtering, linear transformations and convolutions, with both area and energy benefits. DA utilizes Look-Up Tables (LUTs) that are implemented with SRAM to store all possible precomputed results. However, a direct implementation will lead to exponential LUT size increase with respect to the vector size. In this paper, we propose a novel in-memory computation design methodology to reduce the size of LUT without degrading the speed and power performance heavily. First, we propose a heuristic decomposition scheme that only leads to a minimal subset of the precomputed results to be stored in LUT. Second, we design a novel multi-bit in-memory adder exploiting charge-sharing based carry propagation. In the design case, when applying our method to the state-of-the-art DA-based FIR, the overall area is reduced by 10% while maintaining same speed and a similar level of energy.
引用
收藏
页数:4
相关论文
共 11 条
[1]   Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks [J].
Eckert, Charles ;
Wang, Xiaowei ;
Wang, Jingcheng ;
Subramaniyan, Arun ;
Iyer, Ravi ;
Sylvester, Dennis ;
Blaauw, David ;
Das, Reetuparna .
2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :383-396
[2]   A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory [J].
Jeloka, Supreet ;
Akesh, Naveen Bharathwaj ;
Sylvester, Dennis ;
Blaauw, David .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2016, 51 (04) :1009-1021
[3]  
Meher P., 2017, ARITHMETIC CIRCUITS
[4]   New Approach to Look-Up-Table Design and Memory-Based Realization of FIR Digital Filter [J].
Meher, Pramod Kumar .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2010, 57 (03) :592-603
[5]   LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter [J].
Mohanty, Basant K. ;
Meher, Pramod Kumar ;
Patel, Sujit K. .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (05) :1926-1935
[6]  
Panwar M., 2017, 2017 EUROPEAN C CIRC, P1
[7]  
Rani JS, 2014, IEEE INT ADV COMPUT, P789, DOI 10.1109/IAdCC.2014.6779423
[8]   Edge Computing: Vision and Challenges [J].
Shi, Weisong ;
Cao, Jie ;
Zhang, Quan ;
Li, Youhuizi ;
Xu, Lanyu .
IEEE INTERNET OF THINGS JOURNAL, 2016, 3 (05) :637-646
[9]  
Zelby L. W., 1989, IEEE Technology and Society Magazine, V8, P4, DOI [10.1109/53.29648, 10.1109/44.41514]
[10]   Hardware-Efficient Realization of Prime-Length DCT Based on Distributed Arithmetic [J].
Xie, Jiafeng ;
Meher, Pramod Kumar ;
He, Jianjun .
IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (06) :1170-1178