Disentangled Non-local Neural Networks

被引:292
作者
Yin, Minghao [1 ]
Yao, Zhuliang [1 ,2 ]
Cao, Yue [2 ]
Li, Xiu [1 ]
Zhang, Zheng [2 ]
Lin, Stephen [2 ]
Hu, Han [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
来源
COMPUTER VISION - ECCV 2020, PT XV | 2020年 / 12360卷
关键词
D O I
10.1007/978-3-030-58555-6_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The non-local block is a popular module for strengthening the context modeling ability of a regular convolutional neural network. This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel. We also observe that the two terms trained alone tend to model different visual clues, e.g. the whitened pairwise term learns within-region relationships while the unary term learns salient boundaries. However, the two terms are tightly coupled in the non-local block, which hinders the learning of each. Based on these findings, we present the disentangled non-local block, where the two terms are decoupled to facilitate learning for both terms. We demonstrate the effectiveness of the decoupled design on various tasks, such as semantic segmentation on Cityscapes, ADE20K and PASCAL Context, object detection on COCO, and action recognition on Kinetics. Code is available at https://github.com/yinmh17/DNL-Semantic-Segmentation and https://github.com/Howal/DNL-Object-Detection
引用
收藏
页码:191 / 207
页数:17
相关论文
共 38 条
[1]  
Britz D, 2017, Arxiv, DOI [arXiv:1703.03906, DOI 10.48550/ARXIV.1703.03906, 10.48550/arXiv.1703.03906]
[2]  
Cao Y, 2019, Arxiv, DOI arXiv:1904.11492
[3]  
Chen K, 2019, Arxiv, DOI arXiv:1906.07155
[4]  
Chen LC, 2017, Arxiv, DOI arXiv:1706.05587
[5]   Dynamic Convolution: Attention over Convolution Kernels [J].
Chen, Yinpeng ;
Dai, Xiyang ;
Liu, Mengchen ;
Chen, Dongdong ;
Yuan, Lu ;
Liu, Zicheng .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11027-11036
[6]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[7]  
Deng J., 2019, IEEE INT C COMPUTER
[8]   Adaptive Context Network for Scene Parsing [J].
Fu, Jun ;
Liu, Jing ;
Wang, Yuhang ;
Li, Yong ;
Bao, Yongjun ;
Tang, Jinhui ;
Lu, Hanqing .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6747-6756
[9]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149
[10]   Learning Region Features for Object Detection [J].
Gu, Jiayuan ;
Hu, Han ;
Wang, Liwei ;
Wei, Yichen ;
Dai, Jifeng .
COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :392-406