Adaptive deformable convolutional network

被引:62
作者
Chen, Feng [1 ]
Wu, Fei [2 ]
Xu, Jing [3 ]
Gao, Guangwei [2 ,6 ]
Ge, Qi [4 ]
Jing, Xiao-Yuan [2 ,5 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Coll Automat, Nanjing, Peoples R China
[3] Hohai Univ, Sch Law, Nanjing, Peoples R China
[4] Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Peoples R China
[5] Wuhan Univ, Sch Comp, Wuhan, Peoples R China
[6] Nanjing Univ Posts & Telecommun, Inst Adv Technol, Nanjing, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Deformable convolution; Semantic segmentation; Object detection; Geometric transformation;
D O I
10.1016/j.neucom.2020.06.128
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deformable Convolutional Networks (DCNs) are proposed to solve the inherent limited geometric transformation in CNNs, showing outstanding performance on sophisticated computer vision tasks. Though they can rule out irrelevant image content and focus on region of interest to some degree, the adaptive learning of the deformation is still limited. In this paper, we delve it from the aspects of deformable modules and deformable organizations to extend the scope of deformation ability. Concretely, on the one hand, we reformulate the deformable convolution and RoIpooling by reconsidering spatial-wise attention, channel-wise attention and spatial-channel interdependency, to improve the single convolution's ability to focus on pertinent image contents. On the other hand, an empirical study is conducted on various and general arrangements of deformable convolutions (e.g., connection type) in DCNs. Especially on semantic segmentation, the study yields significant findings for a proper combination of deformable convolutions. To verify the effectiveness and superiority of our proposed deformable modules, we also provide extensive ablation study for them and compare them with other previous versions. With the proposed contribution, our refined Deformable ConvNets achieve state-of-the-art performance on two semantic segmentation benchmarks (PASCAL VOC 2012 and Cityscapes) and an object detection benchmark (MS COCO). (c) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:853 / 864
页数:12
相关论文
共 53 条
[1]  
[Anonymous], PROC CVPR IEEE
[2]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[3]  
[Anonymous], 2016, P 29 IEEE C COMPUTER
[4]  
[Anonymous], 2018, P EUR C COMP VIS ECC
[5]  
Chen K., ARXIV PREPRINT ARXIV
[6]  
Chen L.-C., arXiv preprint arXiv:2106.01345
[7]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[8]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[9]  
Cordonnier J.-B., ARXIV PREPRINT ARXIV
[10]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223