Learning prototypes from background and latent objects for few-shot semantic segmentation

被引：0

作者：

Wang, Yicong ^{[1
]}

Huang, Rong ^{[1
,3
]}

Zhou, Shubo ^{[1
,3
]}

Jiang, Xueqin ^{[1
,3
]}

Fang, Zhijun ^{[2
]}

机构：

[1] Donghua Univ, Coll Informat Sci & Technol, Shanghai 201620, Peoples R China

[2] Donghua Univ, Sch Comp Sci & Technol, Shanghai 201620, Peoples R China

[3] Donghua Univ, Engn Res Ctr Digitized Text & Apparel Technol, Minist Educ, Shanghai 201620, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 314卷

基金：

中国国家自然科学基金;

关键词：

Semantic segmentation; Few-shot semantic segmentation; Prototype learning; Self-attention mechanism; NETWORK;

D O I：

10.1016/j.knosys.2025.113218

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Few-shot semantic segmentation (FSS) aims to segment target object within a given image supported by few samples with pixel-level annotations. Existing FSS framework primarily focuses on target area for learning a target-object prototype while directly neglecting non-target clues. As such, the target-object prototype has not only to segment the target object but also to filter out non-target area simultaneously, resulting in numerous false positives. In this paper, we propose a background and latent-object prototype learning network (BLPLNet), which learns prototypes from not only the target area but also the non-target counterpart. From our perspective, the non-target area is delineated into background full of repeated textures and salient objects, refer to as latent objects in this paper. Specifically, a background mining module (BMM) is developed to specially learn a background prototype by episodic learning. The learned background prototype replaces the target-object one for background filtering, reducing the false positives. Moreover, a latent object mining module (LOMM), based on self-attention mechanism, works together with the BMM for learning multiple soft-orthogonal prototypes from latent objects. Then, the learned latent-object prototypes, which condense the general knowledge of objects, are used in a target object enhancement module (TOEM) to enhance the target-object prototype with the guidance of affinity-based scores. Extensive experiments on PASCAL-5i and COCO-20i datasets demonstrate the superiority of the BLPLNet, which outperforms state-of-the-art methods by an average of 0.60% on PASCAL5i. Ablation studies validate the effectiveness of each component, and visualization results indicate that the learned latent-object prototypes indeed convey the general knowledge of objects.

引用

页数：11

共 55 条

[51] SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation [J].

Zhang, Xiaolin ;

Wei, Yunchao ;

Yang, Yi ;

Huang, Thomas S. .

IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (09) :3855-3865

[52] Transferring and Regularizing Prediction for Semantic Segmentation [J].

Zhang, Yiheng ;

Qiu, Zhaofan ;

Yao, Ting ;

Ngo, Chong-Wah ;

Liu, Dong ;

Mei, Tao .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9618-9627

[53] PSANet: Point-wise Spatial Attention Network for Scene Parsing [J].

Zhao, Hengshuang ;

Zhang, Yi ;

Liu, Shu ;

Shi, Jianping ;

Loy, Chen Change ;

Lin, Dahua ;

Jia, Jiaya .

COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :270-286

[54] Pyramid Scene Parsing Network [J].

Zhao, Hengshuang ;

Shi, Jianping ;

Qi, Xiaojuan ;

Wang, Xiaogang ;

Jia, Jiaya .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6230-6239

[55]

Zhao X, 2023, Arxiv, DOI arXiv:2306.12156

← 1 2 3 4 5 6 →