Learning From Box Annotations for Referring Image Segmentation

被引：5

作者：

Feng, Guang ^{[1
]}

Zhang, Lihe ^{[1
]}

Hu, Zhiwei ^{[1
]}

Lu, Huchuan ^{[1
]}

机构：

[1] Dalian Univ Technol, Sch Informat & Commun Engn, Dalian 116024, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Proposals; Annotations; Image segmentation; Visualization; Semantics; Training; Noise measurement; Adversarial boundary loss; bounding box (BB) annotation; co-training (Co-T) strategy; weakly supervised referring image segmentation (RIS);

D O I：

10.1109/TNNLS.2022.3201372

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation (RIS) has obtained an impressive achievement by fully convolutional networks (FCNs). However, previous RIS methods require a large number of pixel-level annotations. In this article, we present a weakly supervised RIS method by using bounding box (BB) annotations. In the first stage, we introduce an adversarial boundary loss to extract the object contour from the BB, which is then used to select appropriate region proposals for pseudoground-truth (PGT) generation. In the second stage, we design a co-training (Co-T) strategy to purify the pseudolabels. Specifically, we train two networks and interactively guide them to pick clean labels for each other's networks, which can weaken the effect of noisy labels on model training. Experiment results on four benchmark datasets demonstrate that the proposed method can produce high-quality masks with a speed of 63 frames/s.

引用

页码：3927 / 3937

页数：11

共 45 条

[1] Multiscale Combinatorial Grouping [J].

Arbelaez, Pablo ;

Pont-Tuset, Jordi ;

Barron, Jonathan T. ;

Marques, Ferran ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :328-335

[2]

Arpit D, 2017, PR MACH LEARN RES, V70

[3] See-Through-Text Grouping for Referring Image Segmentation [J].

Chen, Ding-Jie ;

Jia, Songhao ;

Lo, Yi-Chen ;

Chen, Hwann-Tzong ;

Liu, Tyng-Luh .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7453-7462

[4]

Chen Y.-W., 2019, PROC BRIT MACH VIS C, P1

[5] BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation [J].

Dai, Jifeng ;

He, Kaiming ;

Sun, Jian .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1635-1643

[6] Vision-Language Transformer and Query Generation for Referring Segmentation [J].

Ding, Henghui ;

Liu, Chang ;

Wang, Suchen ;

Jiang, Xudong .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :16301-16310

[7] Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation [J].

Feng, Guang ;

Hu, Zhiwei ;

Zhang, Lihe ;

Lu, Huchuan .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15501-15510

[8] Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation [J].

Feng, Guang ;

Hu, Zhiwei ;

Zhang, Lihe ;

Sun, Jiayu ;

Lu, Huchuan .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) :2246-2258

[9]

Hsu CC, 2019, ADV NEUR IN, V32

[10] Segmentation from Natural Language Expressions [J].

Hu, Ronghang ;

Rohrbach, Marcus ;

Darrell, Trevor .

COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :108-124

← 1 2 3 4 5 →