Channel and Spatial Enhancement Network for human parsing☆

被引:0
作者
Liu, Kunliang [1 ,2 ]
Jin, Rize [2 ]
Li, Yuelong [2 ]
Wang, Jianming [2 ]
Hwang, Wonjun [1 ]
机构
[1] Ajou Univ, Dept Artificial Intelligence, 206 Worldcup Ro, Suwon 16499, South Korea
[2] Tiangong Univ, 399 Binshui West Rd, Tianjin 300387, Peoples R China
基金
新加坡国家研究基金会; 中国国家自然科学基金;
关键词
Human parsing; Semantic-spatial gaps; Feature alignment; High-semantic feature; High-resolution feature;
D O I
10.1016/j.imavis.2024.105332
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The dominant backbones of neural networks for scene parsing consist of multiple stages, where feature maps indifferent stages often contain varying levels of spatial and semantic information. High-level features convey more semantics and fewer spatial details, while low-level features possess fewer semantics and more spatial details. Consequently, there are semantic-spatial gaps among features at different levels, particularly in human parsing tasks. Many existing approaches directly upsample multi-stage features and aggregate them through addition or concatenation, without addressing the semantic-spatial gaps present among these features. This inevitably leads to spatial misalignment, semantic mismatch, and ultimately misclassification in parsing, especially for human parsing that demands more semantic information and more fine details of feature maps for the reason of intricate textures, diverse clothing styles, and heavy scale variability across different human parts. In this paper, we effectively alleviate the long-standing challenge of addressing semantic-spatial gaps between features from different stages by innovatively utilizing the subtraction and addition operations to recognize the semantic and spatial differences and compensate for them. Based on these principles, we propose the Channel and Spatial Enhancement Network (CSENet) for parsing, offering a straightforward and intuitive solution for addressing semantic-spatial gaps via injecting high-semantic information to lower-stage features and vice versa, introducing fine details to higher-stage features. Extensive experiments on three dense prediction tasks have demonstrated the efficacy of our method. Specifically, our method achieves the best performance on the LIP and CIHP datasets and we also verify the generality of our method on the ADE20K dataset.
引用
收藏
页数:12
相关论文
共 71 条
[1]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[2]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[3]   Attention to Scale: Scale-aware Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Yang, Yi ;
Wang, Jiang ;
Xu, Wei ;
Yuille, Alan L. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3640-3649
[4]   Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks [J].
Chen, Weihua ;
Xu, Xianzhe ;
Jia, Jian ;
Luo, Hao ;
Wang, Yaohua ;
Wang, Fan ;
Jin, Rong ;
Sun, Xiuyu .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :15050-15061
[5]  
Chen XJ, 2023, Arxiv, DOI arXiv:2308.12218
[6]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[7]   Semantic Segmentation With Context Encoding and Multi-Path Decoding [J].
Ding, Henghui ;
Jiang, Xudong ;
Shuai, Bing ;
Liu, Ai Qun ;
Wang, Gang .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :3520-3533
[8]   Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [J].
Ding, Xiaohan ;
Zhang, Xiangyu ;
Han, Jungong ;
Ding, Guiguang .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11953-11965
[9]  
Dosovitskiy Alexey., 2021, PROC INT C LEARN REP, P2021
[10]   Adaptive Context Network for Scene Parsing [J].
Fu, Jun ;
Liu, Jing ;
Wang, Yuhang ;
Li, Yong ;
Bao, Yongjun ;
Tang, Jinhui ;
Lu, Hanqing .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6747-6756