Algorithm-hardware Co-design for Deformable Convolution

被引:5
作者
Huang, Qijing [1 ]
Wang, Dequan [1 ]
Gao, Yizhao [2 ]
Cai, Yaohui [3 ]
Dong, Zhen [1 ]
Wu, Bichen [1 ]
Keutzer, Kurt [1 ]
Wawrzynek, John [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
来源
FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019) | 2019年
关键词
D O I
10.1109/EMC2-NIPS53020.2019.00019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
FPGAs provide a flexible and efficient platform to accelerate rapidly-changing algorithms for computer vision. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, including object detection and instance segmentation, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and therefore, require specialized convolutions to aggregate spatial information. To address this, recent work proposes dynamic deformable convolution to augment regular convolutions. Regular convolutions process a fixed grid of pixels across all the spatial locations in an image, while dynamic deformable convolutions may access arbitrary pixels. The access pattern of deformable convolutions is input-dependent and varies per spatial location. These properties lead to inefficient memory accesses of inputs with existing hardware. In this work, we first investigate the overhead of the deformable convolution on embedded FPGA SoCs, and then show the accuracy-latency tradeoffs for a set of algorithm modifications, including full versus depthwise, fixed-shape, and limited-range. These modifications benefit the efficiency of the embedded accelerator in general. We build an efficient object detection network with modified deformable convolutions and quantize the network using state-of-the-art quantization methods. Experiments show that our co-design optimization for the deformable convolution achieves significant hardware speedup with little accuracy compromised.
引用
收藏
页码:48 / 51
页数:4
相关论文
共 8 条
  • [1] Hybrid Task Cascade for Instance Segmentation
    Chen, Kai
    Pang, Jiangmiao
    Wang, Jiaqi
    Xiong, Yu
    Li, Xiaoxiao
    Sun, Shuyang
    Feng, Wansen
    Liu, Ziwei
    Shi, Jianping
    Ouyang, Wanli
    Loy, Chen Change
    Lin, Dahua
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4969 - 4978
  • [2] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [3] Deformable Convolutional Networks
    Dai, Jifeng
    Qi, Haozhi
    Xiong, Yuwen
    Li, Yi
    Zhang, Guodong
    Hu, Han
    Wei, Yichen
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 764 - 773
  • [4] Dong Z, 2019, Arxiv, DOI arXiv:1905.03696
  • [5] Ma NN, 2018, Arxiv, DOI [arXiv:1807.11164, DOI 10.48550/ARXIV.1807.11164, 10.48550/arxiv.1807.11164, 10.48550/arXiv.1807.11164]
  • [6] Deep Layer Aggregation
    Yu, Fisher
    Wang, Dequan
    Shelhamer, Evan
    Darrell, Trevor
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2403 - 2412
  • [7] Trajectories of sleep problems among adolescents after the Wenchuan earthquake: the role of posttraumatic stress disorder symptoms
    Zhou, Xiao
    Zhen, Rui
    Wu, Xinchun
    [J]. PSYCHOLOGY & HEALTH, 2019, 34 (07) : 811 - 827
  • [8] Deformable ConvNets v2: More Deformable, Better Results
    Zhu, Xizhou
    Hu, Han
    Lin, Stephen
    Dai, Jifeng
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9300 - 9308