MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation

被引:43
作者
Yang, Yong [1 ]
Chen, Qiong [1 ]
Feng, Yuan [1 ,2 ]
Huang, Tianlin [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Guangdong Prov Key Lab Artificial Intelligence Me, Guangzhou, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
基金
中国国家自然科学基金;
关键词
NETWORK;
D O I
10.1109/CVPR52729.2023.00689
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing few-shot segmentation methods are based on the meta-learning strategy and extract instance knowledge from a support set and then apply the knowledge to segment target objects in a query set. However, the extracted knowledge is insufficient to cope with the variable intra-class differences since the knowledge is obtained from a few samples in the support set. To address the problem, we propose a multi-information aggregation network (MIANet) that effectively leverages the general knowledge, i.e., semantic word embeddings, and instance information for accurate segmentation. Specifically, in MIANet, a general information module (GIM) is proposed to extract a general class prototype from word embeddings as a supplement to instance information. To this end, we design a triplet loss that treats the general class prototype as an anchor and samples positive-negative pairs from local features in the support set. The calculated triplet loss can transfer semantic similarities among language identities from a word embedding space to a visual representation space. To alleviate the model biasing towards the seen training classes and to obtain multi-scale information, we then introduce a non-parametric hierarchical prior module (HPM) to generate unbiased instance-level information via calculating the pixel-level similarity between the support and query image features. Finally, an information fusion module (IFM) combines the general and instance information to make predictions for the query image. Extensive experiments on PASCAL-5(i) and COCO-20(i) show that MIANet yields superior performance and set a new state-of-the-art. Code is available at github.com/Aldrich2y/MIANet.
引用
收藏
页码:7131 / 7140
页数:10
相关论文
共 40 条
[1]  
[Anonymous], 2018, BMVC
[2]  
Bucher M, 2019, ADV NEUR IN, V32
[3]   FREE: Feature Refinement for Generalized Zero-Shot Learning [J].
Chen, Shiming ;
Wang, Wenjie ;
Xia, Beihao ;
Peng, Qinmu ;
You, Xinge ;
Zheng, Feng ;
Shao, Ling .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :122-131
[4]   Beyond triplet loss: a deep quadruplet network for person re-identification [J].
Chen, Weihua ;
Chen, Xiaotang ;
Zhang, Jianguo ;
Huang, Kaiqi .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1320-1329
[5]   Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function [J].
Cheng, De ;
Gong, Yihong ;
Zhou, Sanping ;
Wang, Jinjun ;
Zheng, Nanning .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1335-1344
[6]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[7]  
Fan Qi, 2022, ARXIV220711549
[8]   Context-aware Feature Generation for Zero-shot Semantic Segmentation [J].
Gu, Zhangxuan ;
Zhou, Siyuan ;
Niu, Li ;
Zhao, Zihan ;
Zhang, Liqing .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :1921-1929
[9]   Simultaneous Detection and Segmentation [J].
Hariharan, Bharath ;
Arbelaez, Pablo ;
Girshick, Ross ;
Malik, Jitendra .
COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :297-312
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778