Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design

被引：6

作者：

Zhang, Xingyao ^{[1
]}

Fu, Xin ^{[1
]}

Zhuang, Donglin ^{[2
]}

Xie, Chenhao ^{[3
]}

Song, Shuaiwen Leon ^{[2
]}

机构：

[1] Univ Houston, Dept Elect & Comp Engn, Houston, TX 77004 USA

[2] Univ Sydney, Future Syst Architecture FSA Lab, Sydney, NSW 2006, Australia

[3] Pacific Northwest Natl Lab PNNL, Richland, WA 99354 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2021年 / 70卷 / 04期

关键词：

Accelerators; domain-specific architectures; machine learning; emerging technologies; IMAGE CLASSIFICATION;

D O I：

10.1109/TC.2021.3056929

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As the demand for the image processing increases, the image features become increasingly complicated. Although the Convolutional Neural Network (CNN) have been widely adopted for the imaging processing tasks, it has been found easily misled due to the massive usage of pooling operations. A novel neural network structure called Capsule Networks (CapsNet) is proposed to address the CNN challenge and essentially enhance the learning ability for the image segmentation and object detection. Since the CapsNet contains the high volume of the matrix execution, it has been generally accelerated on modern GPU platforms with the highly optimized deep-learning library. However, the routing procedure of CapsNet introduces the special program and execution features,including massive unshareable intermediate variables and intensive synchronizations, causing inefficient CapsNet execution on modern GPU. To address these challenges, we propose the software-hardware co-designed optimizations, SH-CapsNet, which includes the software-level optimizations named S-CapsNet and a hybrid computing architecture design named PIM-CapsNet. In software-level, S-CapsNet reduces the computation and memory accesses by exploiting the computational redundancy and data similarity of the routing procedure. In hardware-level, the PIM-CapsNet leverages the processing-in-memory capability of today's 3D stacked memory to conduct the off-chip in-memory acceleration solution for the routing procedure, while pipelining with the GPU's on-chip computing capability for accelerating CNN types of layers in CapsNet. Evaluation results demonstrate that either our software or hardware optimizations can significantly improve the CapsNet execution efficiency. Together, our co-design can achieve greatly improvement on both performance ($3.41\times$3.41x) and energy savings (68.72 percent) for CapsNet inference, with negligible accuracy loss.

引用

页码：495 / 510

页数：16

共 50 条

[1] Software-Hardware Co-design for Video Coding Acceleration
Niu, Xinwei
Galarza, Luis
Gao, Ying
Fan, Jeffrey
2012 44TH SOUTHEASTERN SYMPOSIUM ON SYSTEM THEORY (SSST), 2012, : 57 - 60
[2] Facilitating Model-Based Control through Software-Hardware Co-Design
Ramos, Joao
Katz, Benjamin
Chuah, Meng Yee
Kim, Sangbae
2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 566 - 572
[3] From Model to FPGA: Software-Hardware Co-Design for Efficient Neural Network Acceleration
Guo, Kaiyun
Sui, Lingzhi
Qui, Jiantao
Yao, Song
Han, Song
Wang, Yu
Yang, Huanzhang
2016 IEEE HOT CHIPS 28 SYMPOSIUM (HCS), 2016,
[4] Research on software-hardware co-design methodology for video encoder design
Lai, Jin-Mei
Zhang, Yong
Yao, Qing-Dong
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design & Computer Graphics, 2000, 12 (06): : 468 - 472
[5] Analytically Modeling Application Execution for Software-Hardware Co-Design
Guo, Jichi
Meng, Jiayuan
Yi, Qing
Morozov, Vitali
Kumaran, Kalyan
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
[6] Research on Software-hardware Co-design of Reconfigurable CNC System
Wang, Tao
Wang, Liwen
Liu, Qingjian
ADVANCED RESEARCH ON INDUSTRY, INFORMATION SYSTEMS AND MATERIAL ENGINEERING, PTS 1-7, 2011, 204-210 : 458 - +
[7] A software-hardware co-design method for deprivileging instructions in virtualization
Tai, Y. (taiyunfang@ict.ac.cn), 1600, Inst. of Scientific and Technical Information of China (22):
[8] Exploring Hybrid Memory for GPU Energy Efficiency through Software-Hardware Co-Design
Wang, Bin
Wu, Bo
Li, Dong
Shen, Xipeng
Yu, Weikuan
Jiao, Yizheng
Vetter, Jeffrey S.
2013 22ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2013, : 93 - 102
[9] PCTC: Hardware and Software Co-design for Pruned Capsule Networks on Tensor Cores
Hafezan, Mohammad
Jahadi, Reza
Atoofian, Ehsan
EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024, 2024, 14802 : 196 - 210
[10] Energy-Efficient Inference With Software-Hardware Co-Design for Sustainable Artificial Intelligence of Things
Dai, Shengxin
Luo, Zheng
Luo, Wendian
Wang, Siyi
Dai, Cheng
Guo, Bing
Zhou, Xiaokang
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (24): : 39170 - 39182

← 1 2 3 4 5 →