PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing

被引：12

作者：

Wang, Yang ^{[1
,2
]}

Deng, Dazheng ^{[1
,2
]}

Liu, Leibo ^{[1
,2
]}

Wei, Shaojun ^{[1
,2
]}

Yin, Shouyi ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Sch Integrated Circuits, Beijing Innovat Ctr Future Chip, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Sch Integrated Circuits, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2022年 / 69卷 / 10期

关键词：

DNN training processor; edge-devices; reconfigurable dataflow; posit; logarithm-domain computing; ACCELERATOR;

D O I：

10.1109/TCSI.2022.3184115

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Edge device deep neural network (DNN) training is practical to improve model adaptivity for unfamiliar datasets while avoiding privacy disclosure and huge communication cost. Nevertheless, apart from feed-forward (FF) as inference, DNN training still requires back-propagation (BP) and weight gradient (WG), introducing power-consuming floating-point computing requirements, hardware underutilization, and energy bottleneck from excessive memory access. This paper proposes a DNN training processor named PL-NPU to solve the above challenges with three innovations. First, a posit-based logarithmdomain processing element (PE) adapts to various training data requirements with a low bit-width format and reduces energy by transferring complicated arithmetics into simple logarithm domain operation. Second, a reconfigurable inter-intra-channelreuse dataflow dynamically adjusts the PE mapping with a regrouping omega network to improve the operands reuse for higher hardware utilization. Third, a pointed-stake-shaped codec unit adaptively compresses small values to variable-length data format while compressing large values to fixed-length 8b posit format, reducing the memory access for breaking the training energy bottleneck. Simulated with 28nm CMOS technology, the proposed PL-NPU achieves a maximum frequency of 1040MHz with 343mW and 5.28mm(2). The peak energy efficiency is 3.87TFLOPS/W for 0.6V at 60MHz. Compared with the state-of-the-art training processor, PL-NPU reaches 3.75x higher energy efficiency and offers 1.68x speedup when training ResNet18.

引用

页码：4042 / 4055

页数：14

共 47 条

[1] A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling [J].

Agrawal, Ankur ;

Lee, Sae Kyu ;

Silberman, Joel ;

Ziegler, Matthew ;

Kang, Mingu ;

Venkataramani, Swagath ;

Cao, Nianzheng ;

Fleischer, Bruce ;

Guillorn, Michael ;

Cohen, Matthew ;

Mueller, Silvia ;

Oh, Jinwook ;

Lutz, Martin ;

Jung, Jinwook ;

Koswatta, Siyu ;

Zhou, Ching ;

Zalani, Vidhi ;

Bonanno, James ;

Casatuta, Robert ;

Chen, Chia-Yu ;

Choi, Jungwook ;

Haynie, Howard ;

Herbert, Alyssa ;

Jain, Radhika ;

Kar, Monodeep ;

Kim, Kyu-Hyoun ;

Li, Yulong ;

Ren, Zhibin ;

Rider, Scot ;

Schaal, Marcel ;

Schelm, Kerstin ;

Scheuermann, Michael ;

Sun, Xiao ;

Tran, Hung ;

Wang, Naigang ;

Wang, Wei ;

Zhang, Xin ;

Shah, Vinay ;

Curran, Brian ;

Srinivasan, Vijayalakshmi ;

Lu, Pong-Fei ;

Shukla, Sunil ;

Chang, Leland ;

Gopalakrishnan, Kailash .

2021 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC), 2021, 64 :144-+

[2]

[Anonymous], DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING

[3]

Cambier L., 2020, PROC INT C LEARN REP

[4] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices [J].

Chen, Yu-Hsin ;

Yange, Tien-Ju ;

Emer, Joel S. ;

Sze, Vivienne .

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) :292-308

[5] A Deep Neural Network Training Architecture With Inference-Aware Heterogeneous Data-Type [J].

Choi, Seungkyu ;

Shin, Jaekang ;

Kim, Lee-Sup .

IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (05) :1216-1229

[6] TrainWare: A Memory Optimized Weight Update Architecture for On-Device Convolutional Neural Network Training [J].

Choi, Seungkyu ;

Sim, Jaehyeong ;

Kang, Myeonggu ;

Kim, Lee-Sup .

PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN (ISLPED '18), 2018, :104-109

[7] An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In Situ Personalization on Smart Devices [J].

Choi, Seungkyu ;

Sim, Jaehyeong ;

Kang, Myeonggu ;

Choi, Yeongjae ;

Kim, Hyeonuk ;

Kim, Lee-Sup .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (10) :2691-2702

[8]

Choquette W., 2020, 2020 IEEE HOT CHIPS, P1

[9] Arithmetic on the European logarithmic microprocessor [J].

Coleman, JN ;

Chester, EI ;

Softley, CI ;

Kadlec, J .

IEEE TRANSACTIONS ON COMPUTERS, 2000, 49 (07) :702-715

[10]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

← 1 2 3 4 5 →