Learning From Box Annotations for Referring Image Segmentation

被引:5
作者
Feng, Guang [1 ]
Zhang, Lihe [1 ]
Hu, Zhiwei [1 ]
Lu, Huchuan [1 ]
机构
[1] Dalian Univ Technol, Sch Informat & Commun Engn, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Proposals; Annotations; Image segmentation; Visualization; Semantics; Training; Noise measurement; Adversarial boundary loss; bounding box (BB) annotation; co-training (Co-T) strategy; weakly supervised referring image segmentation (RIS);
D O I
10.1109/TNNLS.2022.3201372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation (RIS) has obtained an impressive achievement by fully convolutional networks (FCNs). However, previous RIS methods require a large number of pixel-level annotations. In this article, we present a weakly supervised RIS method by using bounding box (BB) annotations. In the first stage, we introduce an adversarial boundary loss to extract the object contour from the BB, which is then used to select appropriate region proposals for pseudoground-truth (PGT) generation. In the second stage, we design a co-training (Co-T) strategy to purify the pseudolabels. Specifically, we train two networks and interactively guide them to pick clean labels for each other's networks, which can weaken the effect of noisy labels on model training. Experiment results on four benchmark datasets demonstrate that the proposed method can produce high-quality masks with a speed of 63 frames/s.
引用
收藏
页码:3927 / 3937
页数:11
相关论文
共 45 条
[31]   Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries [J].
Margffoy-Tuay, Edgar ;
Perez, Juan C. ;
Botero, Emilio ;
Arbelaez, Pablo .
COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 :656-672
[32]  
Olivier C., 2006, IEEE Transactions on Neural Networks
[33]   Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation [J].
Papandreou, George ;
Chen, Liang-Chieh ;
Murphy, Kevin P. ;
Yuille, Alan L. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1742-1750
[34]   Key-Word-Aware Network for Referring Expression Image Segmentation [J].
Shi, Hengcan ;
Li, Hongliang ;
Meng, Fanman ;
Wu, Qingbo .
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :38-54
[35]   Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation [J].
Song, Chunfeng ;
Huang, Yan ;
Ouyang, Wanli ;
Wang, Liang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3131-3140
[36]   Learning to Detect Salient Objects with Image-level Supervision [J].
Wang, Lijun ;
Lu, Huchuan ;
Wang, Yifan ;
Feng, Mengyang ;
Wang, Dong ;
Yin, Baocai ;
Ruan, Xiang .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3796-3805
[37]   Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation [J].
Wang, Yude ;
Zhang, Jie ;
Kan, Meina ;
Shan, Shiguang ;
Chen, Xilin .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12272-12281
[38]   Bottom-Up Shift and Reasoning for Referring Image Segmentation [J].
Yang, Sibei ;
Xia, Meng ;
Li, Guanbin ;
Zhou, Hong-Yu ;
Yu, Yizhou .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :11261-11270
[39]   Referring Segmentation in Images and Videos With Cross-Modal Self-Attention Network [J].
Ye, Linwei ;
Rochan, Mrigank ;
Liu, Zhi ;
Zhang, Xiaoqin ;
Wang, Yang .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) :3719-3732
[40]   Cross-Modal Self-Attention Network for Referring Image Segmentation [J].
Ye, Linwei ;
Rochan, Mrigank ;
Liu, Zhi ;
Wang, Yang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10494-10503