Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design

被引:6
|
作者
Zhang, Xingyao [1 ]
Fu, Xin [1 ]
Zhuang, Donglin [2 ]
Xie, Chenhao [3 ]
Song, Shuaiwen Leon [2 ]
机构
[1] Univ Houston, Dept Elect & Comp Engn, Houston, TX 77004 USA
[2] Univ Sydney, Future Syst Architecture FSA Lab, Sydney, NSW 2006, Australia
[3] Pacific Northwest Natl Lab PNNL, Richland, WA 99354 USA
关键词
Accelerators; domain-specific architectures; machine learning; emerging technologies; IMAGE CLASSIFICATION;
D O I
10.1109/TC.2021.3056929
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As the demand for the image processing increases, the image features become increasingly complicated. Although the Convolutional Neural Network (CNN) have been widely adopted for the imaging processing tasks, it has been found easily misled due to the massive usage of pooling operations. A novel neural network structure called Capsule Networks (CapsNet) is proposed to address the CNN challenge and essentially enhance the learning ability for the image segmentation and object detection. Since the CapsNet contains the high volume of the matrix execution, it has been generally accelerated on modern GPU platforms with the highly optimized deep-learning library. However, the routing procedure of CapsNet introduces the special program and execution features,including massive unshareable intermediate variables and intensive synchronizations, causing inefficient CapsNet execution on modern GPU. To address these challenges, we propose the software-hardware co-designed optimizations, SH-CapsNet, which includes the software-level optimizations named S-CapsNet and a hybrid computing architecture design named PIM-CapsNet. In software-level, S-CapsNet reduces the computation and memory accesses by exploiting the computational redundancy and data similarity of the routing procedure. In hardware-level, the PIM-CapsNet leverages the processing-in-memory capability of today's 3D stacked memory to conduct the off-chip in-memory acceleration solution for the routing procedure, while pipelining with the GPU's on-chip computing capability for accelerating CNN types of layers in CapsNet. Evaluation results demonstrate that either our software or hardware optimizations can significantly improve the CapsNet execution efficiency. Together, our co-design can achieve greatly improvement on both performance ($3.41\times$3.41x) and energy savings (68.72 percent) for CapsNet inference, with negligible accuracy loss.
引用
收藏
页码:495 / 510
页数:16
相关论文
共 50 条
  • [1] Software-Hardware Co-design for Video Coding Acceleration
    Niu, Xinwei
    Galarza, Luis
    Gao, Ying
    Fan, Jeffrey
    2012 44TH SOUTHEASTERN SYMPOSIUM ON SYSTEM THEORY (SSST), 2012, : 57 - 60
  • [2] Facilitating Model-Based Control through Software-Hardware Co-Design
    Ramos, Joao
    Katz, Benjamin
    Chuah, Meng Yee
    Kim, Sangbae
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 566 - 572
  • [3] From Model to FPGA: Software-Hardware Co-Design for Efficient Neural Network Acceleration
    Guo, Kaiyun
    Sui, Lingzhi
    Qui, Jiantao
    Yao, Song
    Han, Song
    Wang, Yu
    Yang, Huanzhang
    2016 IEEE HOT CHIPS 28 SYMPOSIUM (HCS), 2016,
  • [4] Research on software-hardware co-design methodology for video encoder design
    Lai, Jin-Mei
    Zhang, Yong
    Yao, Qing-Dong
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design & Computer Graphics, 2000, 12 (06): : 468 - 472
  • [5] Analytically Modeling Application Execution for Software-Hardware Co-Design
    Guo, Jichi
    Meng, Jiayuan
    Yi, Qing
    Morozov, Vitali
    Kumaran, Kalyan
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [6] Research on Software-hardware Co-design of Reconfigurable CNC System
    Wang, Tao
    Wang, Liwen
    Liu, Qingjian
    ADVANCED RESEARCH ON INDUSTRY, INFORMATION SYSTEMS AND MATERIAL ENGINEERING, PTS 1-7, 2011, 204-210 : 458 - +
  • [7] A software-hardware co-design method for deprivileging instructions in virtualization
    Tai, Y. (taiyunfang@ict.ac.cn), 1600, Inst. of Scientific and Technical Information of China (22):
  • [8] Exploring Hybrid Memory for GPU Energy Efficiency through Software-Hardware Co-Design
    Wang, Bin
    Wu, Bo
    Li, Dong
    Shen, Xipeng
    Yu, Weikuan
    Jiao, Yizheng
    Vetter, Jeffrey S.
    2013 22ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2013, : 93 - 102
  • [9] PCTC: Hardware and Software Co-design for Pruned Capsule Networks on Tensor Cores
    Hafezan, Mohammad
    Jahadi, Reza
    Atoofian, Ehsan
    EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024, 2024, 14802 : 196 - 210
  • [10] Energy-Efficient Inference With Software-Hardware Co-Design for Sustainable Artificial Intelligence of Things
    Dai, Shengxin
    Luo, Zheng
    Luo, Wendian
    Wang, Siyi
    Dai, Cheng
    Guo, Bing
    Zhou, Xiaokang
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (24): : 39170 - 39182