FEW-SHOT SEMANTIC SEGMENTATION WITH FREQUENCY PROTOTYPE LEARNING

被引:0
作者
Wen, Chunlin [1 ]
Hui, Huang [2 ]
Ma, Yan [2 ]
Yuan, Feiniu [2 ]
Zhu, Hongqing [3 ]
Zhu, Peng [4 ]
机构
[1] Shanghai Normal Univ, Coll Informat Mech & Elect Engn, Shanghai, Peoples R China
[2] Shanghai Normal Univ, Coll Informat Mech & Elect Engn, Shanghai Engn Res Ctr Intelligent Educ & Bigdata, Shanghai, Peoples R China
[3] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai, Peoples R China
[4] Shanghai Vixdetect Inspect Equipment Co Ltd, Shanghai, Peoples R China
关键词
Few-shot segmentation; few-shot learning; prototype learning; frequency domain learning; NETWORK;
D O I
10.31577/cai2025192
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Few-shot semantic segmentation is a challenging task aimed at segmenting new objects in the query image with only a few annotated support images. Most advanced methods for this task mainly focus on either global or local prototype learning through global average pooling (GAP) or clustering. However, due to the limitation of average and cluster operation, these methods still fail to exploit the object information from support images entirely. To address these limitations, we propose a generalization of prototype learning in the frequency domain through multi-frequency pooling (MFP) to mine both local and global object information. Based on the MFP, we further build a Frequency Prototype Network (FPNet) consisting of three novel designs. Firstly, the Frequency Prototype Generation Module (FPGM) extracts frequency prototypes by MFP in the DCT domain to provide complete object guidance information. Then, the Prior Attention Mask Module (PAMM) produces a prior attention mask to identify a query target more precisely and retain high generalization. Finally, the Frequency Prototype Selection Module (FPSM) selects the most effective support prototypes to reduce redundancy. Extensive experiments on PASCAL-5i and COCO-20i demonstrate that our model achieves state-of-the-art performances in both 1-shot and 5-shot settings.
引用
收藏
页码:92 / 123
页数:32
相关论文
共 55 条
[1]   On the Texture Bias for Few-Shot CNN Segmentation [J].
Azad, Reza ;
Fayjie, Abdur R. ;
Kauffmann, Claude ;
Ben Ayed, Ismail ;
Pedersoli, Marco ;
Dolz, Jose .
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :2673-2682
[2]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[3]  
Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, 10.48550/arXiv.1706.05587]
[4]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[5]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]   Self-regularized prototypical network for few-shot semantic segmentation [J].
Ding, Henghui ;
Zhang, Hui ;
Jiang, Xudong .
PATTERN RECOGNITION, 2023, 133
[8]  
DONG N., 2018, BRIT MACH VIS C 2018
[9]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[10]  
Gairola S, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P573