Learning prototypes from background and latent objects for few-shot semantic segmentation

被引：0

作者：

Wang, Yicong ^{[1
]}

Huang, Rong ^{[1
,3
]}

Zhou, Shubo ^{[1
,3
]}

Jiang, Xueqin ^{[1
,3
]}

Fang, Zhijun ^{[2
]}

机构：

[1] Donghua Univ, Coll Informat Sci & Technol, Shanghai 201620, Peoples R China

[2] Donghua Univ, Sch Comp Sci & Technol, Shanghai 201620, Peoples R China

[3] Donghua Univ, Engn Res Ctr Digitized Text & Apparel Technol, Minist Educ, Shanghai 201620, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 314卷

基金：

中国国家自然科学基金;

关键词：

Semantic segmentation; Few-shot semantic segmentation; Prototype learning; Self-attention mechanism; NETWORK;

D O I：

10.1016/j.knosys.2025.113218

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Few-shot semantic segmentation (FSS) aims to segment target object within a given image supported by few samples with pixel-level annotations. Existing FSS framework primarily focuses on target area for learning a target-object prototype while directly neglecting non-target clues. As such, the target-object prototype has not only to segment the target object but also to filter out non-target area simultaneously, resulting in numerous false positives. In this paper, we propose a background and latent-object prototype learning network (BLPLNet), which learns prototypes from not only the target area but also the non-target counterpart. From our perspective, the non-target area is delineated into background full of repeated textures and salient objects, refer to as latent objects in this paper. Specifically, a background mining module (BMM) is developed to specially learn a background prototype by episodic learning. The learned background prototype replaces the target-object one for background filtering, reducing the false positives. Moreover, a latent object mining module (LOMM), based on self-attention mechanism, works together with the BMM for learning multiple soft-orthogonal prototypes from latent objects. Then, the learned latent-object prototypes, which condense the general knowledge of objects, are used in a target object enhancement module (TOEM) to enhance the target-object prototype with the guidance of affinity-based scores. Extensive experiments on PASCAL-5i and COCO-20i datasets demonstrate the superiority of the BLPLNet, which outperforms state-of-the-art methods by an average of 0.60% on PASCAL5i. Ablation studies validate the effectiveness of each component, and visualization results indicate that the learned latent-object prototypes indeed convey the general knowledge of objects.

引用

页数：11

共 55 条

[11]

He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]

[12] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[13] Strip Pooling: Rethinking Spatial Pooling for Scene Parsing [J].

Hou, Qibin ;

Zhang, Li ;

Cheng, Ming-Ming ;

Feng, Jiashi .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4002-4011

[14] Densely Connected Convolutional Networks [J].

Huang, Gao ;

Liu, Zhuang ;

van der Maaten, Laurens ;

Weinberger, Kilian Q. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269

[15] CCNet: Criss-Cross Attention for Semantic Segmentation [J].

Huang, Zilong ;

Wang, Xinggang ;

Huang, Lichao ;

Huang, Chang ;

Wei, Yunchao ;

Liu, Wenyu .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :603-612

[16]

Ke Lei, 2023, Advances in Neural Information Processing Systems

[17] Feature Weighting and Boosting for Few-Shot Segmentation [J].

Khoi Nguyen ;

Todorovic, Sinisa .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :622-631

[18] Segment Anything [J].

Kirillov, Alexander ;

Mintun, Eric ;

Ravi, Nikhila ;

Mao, Hanzi ;

Rolland, Chloe ;

Gustafson, Laura ;

Xiao, Tete ;

Whitehead, Spencer ;

Berg, Alexander C. ;

Lo, Wan-Yen ;

Dolla'r, Piotr ;

Girshick, Ross .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, :3992-4003

[19] Retain and Recover: Delving Into Information Loss for Few-Shot Segmentation [J].

Lang, Chunbo ;

Cheng, Gong ;

Tu, Binfei ;

Li, Chao ;

Han, Junwei .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 :5353-5365

[20] Learning What Not to Segment: A New Perspective on Few-Shot Segmentation [J].

Lang, Chunbo ;

Cheng, Gong ;

Tu, Binfei ;

Han, Junwei .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :8047-8057

← 1 2 3 4 5 6 →