High Performance and Power Efficient Accelerator for Cloud Inference

被引：3

作者：

Yao, Jianguo ^{[1
,2
]}

Zhou, Hao ^{[2
]}

Zhang, Yalin ^{[2
]}

Li, Ying ^{[2
]}

Feng, Chuang ^{[2
]}

Chen, Shi ^{[2
]}

Chen, Jiaoyan ^{[2
]}

Wang, Yongdong ^{[2
]}

Hu, Qiaojuan ^{[2
]}

机构：

[1] SJTU, Shanghai, Peoples R China

[2] Enflame Tech Inc, Shanghai, Peoples R China

来源：

2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA | 2023年

关键词：

D O I：

10.1109/HPCA56546.2023.10070941

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Facing the growing complexity of Deep Neural Networks (DNNs), high-performance and power-efficient AI accelerators are desired to provide effective and affordable cloud inference services. We introduce our flagship product, i.e., the Cloudblazer i20 accelerator, which integrates the innovated Deep Thinking Unit (DTU 2.0). The design is driven by requests drawn from various AI inference applications and insights learned from our previous products. With careful tradeoffs in hardware-software co-design, Cloudblazer i20 delivers impressive performance and energy efficiency while maintaining acceptable hardware costs and software complexity/flexibility. To tackle computation- and data-intensive workloads, DTU 2.0 integrates powerful vector/matrix engines and a large-capacity multi-level memory hierarchy with high bandwidth. It supports comprehensive data flow and synchronization patterns to fully exploit parallelism in computation/memory access within or among concurrent tasks. Moreover, it enables sparse data compression/decompression, data broadcasting, repeated data transfer, and kernel code prefetching to optimize bandwidth utilization and reduce data access overheads. To utilize the underlying hardware and simplify the development of customized DNNs/operators, the software stack enables automatic optimizations (such as operator fusion and data flow tuning) and provides diverse programming interfaces for developers. Lastly, the energy consumption is optimized through dynamic power integrity and efficiency management, eliminating integrity risks and energy wastes. Based on the performance requirement, developers also can assign their workloads with the entire or partial hardware resources accordingly. Evaluated with 10 representative DNN models widely adopted in various domains, Cloudblazer i20 outperforms Nvidia T4 and A10 GPUs with a geometric mean of 2.22x and 1.16x in performance and 1.04x and 1.17x in energy efficiency, respectively. The improvements demonstrate the effectiveness of Cloudblazer i20's design that emphasizes performance, efficiency, and flexibility.

引用

页码：1003 / 1016

页数：14

共 50 条

[1] An Efficient FPGA Accelerator Optimized for High Throughput Sparse CNN Inference
Wen, Jiayu
Ma, Yufei
Wang, Zhongfeng
APCCAS 2020: PROCEEDINGS OF THE 2020 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2020), 2020, : 165 - 168
[2] AIX: A high performance and energy efficient inference accelerator on FPGA for a DNN-based commercial speech recognition
Ahn, Minwook
Hwang, Seok Joong
Kim, Wonsub
Jung, Seungrok
Lee, Yeonbok
Chung, Mookyoung
Lim, Woohyung
Kim, Youngjoon
2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 1495 - 1500
[3] An Efficient FPGA Accelerator for Point Cloud
Wang, Zilun
Mao, Wendong
Yang, Peixiang
Wang, Zhongfeng
Lin, Jun
2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022), 2022, : 310 - 315
[4] A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNs
Li, Ning
Liu, Leibo
Wei, Shaojun
Yin, Shouyi
28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2020, : 243 - 243
[5] PointAcc: Efficient Point Cloud Accelerator
Lin, Yujun
Zhang, Zhekai
Tang, Haotian
Wang, Hanrui
Han, Song
PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 449 - 461
[6] Work-in-Progress: A Power-Efficient and High Performance FPGA Accelerator for Convolutional Neural Networks
Gong, Lei
Wang, Chao
Li, Xi
Chen, Huaping
Zhou, Xuehai
2017 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2017,
[7] SPRINT: A High-Performance, Energy-Efficient, and Scalable Chiplet-Based Accelerator With Photonic Interconnects for CNN Inference
Li, Yuan
Louri, Ahmed
Karanth, Avinash
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (10) : 2332 - 2345
[8] CNN inference simulator for accurate and efficient accelerator design
Choi, Seong Bin
Lee, Sang Seol
Jang, Sung Joon
2019 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2019, : 283 - 284
[9] High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection
Zhang, Gang
Zhang, Chaofan
Wang, Fan
Tang, Fulin
Wu, Yihong
Yang, Xuezhi
Liu, Yong
PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 117 - 128
[10] High-Performance Mixed-Low-Precision CNN Inference Accelerator on FPGA
Wang, Junbin
Fang, Shaoxia
Wang, Xi
Ma, Jiangsha
Wang, Taobo
Shan, Yi
IEEE MICRO, 2021, 41 (04) : 31 - 38

← 1 2 3 4 5 →