Referring Image Segmentation by Generative Adversarial Learning

被引：50

作者：

Qiu, Shuang ^{[1
,2
]}

Zhao, Yao ^{[1
,2
]}

Jiao, Jianbo ^{[3
]}

Wei, Yunchao ^{[4
]}

Wei, Shikui ^{[1
,2
]}

机构：

[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China

[2] Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China

[3] Univ Oxford, Dept Engn Sci, Oxford OX1 2JD, England

[4] Univ Illinois, Beckman Inst, Champaign, IL 61820 USA

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Image segmentation; Semantics; Feature extraction; Natural languages; Generators; Generative adversarial networks; Visualization; Image referring segmentation; Adversarial training; PRIMARY OBJECTS; TEXT;

D O I：

10.1109/TMM.2019.2942480

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Referring expression is a kind of language expression being used for referring to particular objects. In this paper, we focus on the problem of image segmentation from natural language referring expressions. Existing works tackle this problem by augmenting the convolutional semantic segmentation networks with an LSTM sentence encoder, which is optimized by a pixel-wise classification loss. We argue that the distribution similarity between the inference and ground truth plays an important role in referring image segmentation. Therefore we introduce a complementary loss considering the consistency between the two distributions. To this end, we propose to train the referring image segmentation model in a generative adversarial fashion, which well addresses the distribution similarity problem. In particular, the proposed adversarial semantic guidance network (ASGN) includes the following advantages: a) more detailed visual information is incorporated by the detail enhancement; b) semantic information counteracts the word embedding impact; c) the proposed adversarial learning approach relieves the distribution inconsistencies. Experimental results on four standard datasets show significant improvements over all the compared baseline models, demonstrating the effectiveness of our method.

引用

页码：1333 / 1344

页数：12

共 58 条

[1]

[Anonymous], 2016, P ADV NEUR INF PROC

[2]

[Anonymous], 2018, IEEE T MULTIMEDIA, DOI DOI 10.1109/TMM.2018.2811621

[3]

[Anonymous], 2016, ARXIV160808305

[4]

[Anonymous], 2017, P INT C LEARN REPR

[5] Language-Based Image Editing with Recurrent Attentive Models [J].

Chen, Jianbo ;

Shen, Yelong ;

Gao, Jianfeng ;

Liu, Jingjing ;

Liu, Xiaodong .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8721-8729

[6] CaMap: Camera-based Map Manipulation on Mobile Devices [J].

Chen, Liang ;

Chen, Dongyi .

PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,

[7] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[8] Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks [J].

Cho, Kyunghyun ;

Courville, Aaron ;

Bengio, Yoshua .

IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) :1875-1886

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10]

Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878

← 1 2 3 4 5 6 →