Referring Expression Comprehension via Co-attention and Visual Context

被引:1
作者
Gao, Youming [1 ]
Ji, Yi [1 ]
Xu, Ting [1 ]
Xu, Yunlong [2 ]
Liu, Chunping [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
[2] Soochow Univ, Appl Tech Sch, Suzhou 215325, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: IMAGE PROCESSING, PT III | 2019年 / 11729卷
基金
中国国家自然科学基金;
关键词
Neural network; Co-attention; Visual context; Referring expression comprehension;
D O I
10.1007/978-3-030-30508-6_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a research hotspot of multimodal media analysis, referring expression comprehension locates the referred object region in an image by mapping a natural language. Though the localizing accuracy of similar objects is often distorted by the presence or absence of supporting objects in the referring expression, we propose a referring expression comprehension method via co-attention and visual context. For lacking supporting objects in referring expression, we propose co-attention to enhance the attention on attributes for the subject module. For existing supporting objects, we introduce visual context to explore the latent link between the candidate object and its supporters. Experiments on three datasets RefCOCO, RefCOCO+, and RefCOCOg, show that our approach outperforms published approaches by a considerable margin.
引用
收藏
页码:119 / 130
页数:12
相关论文
共 26 条
[1]   MUTAN: Multimodal Tucker Fusion for Visual Question Answering [J].
Ben-younes, Hedi ;
Cadene, Remi ;
Cord, Matthieu ;
Thome, Nicolas .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2631-2639
[2]   Query-guided Regression Network with Context Policy for Phrase Grounding [J].
Chen, Kan ;
Kovvuri, Rama ;
Nevatia, Ram .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :824-832
[3]   Control of goal-directed and stimulus-driven attention in the brain [J].
Corbetta, M ;
Shulman, GL .
NATURE REVIEWS NEUROSCIENCE, 2002, 3 (03) :201-215
[4]   Visual Dialog [J].
Das, Abhishek ;
Kottur, Satwik ;
Gupta, Khushi ;
Singh, Avi ;
Yadav, Deshraj ;
Moura, Jose M. F. ;
Parikh, Devi ;
Batra, Dhruv .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1080-1089
[5]  
He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[6]   Modeling Relationships in Referential Expressions with Compositional Modular Networks [J].
Hu, Ronghang ;
Rohrbach, Marcus ;
Andreas, Jacob ;
Darrell, Trevor ;
Saenko, Kate .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4418-4427
[7]   Natural Language Object Retrieval [J].
Hu, Ronghang ;
Xu, Huazhe ;
Rohrbach, Marcus ;
Feng, Jiashi ;
Saenko, Kate ;
Darrell, Trevor .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4555-4564
[8]   Inferring and Executing Programs for Visual Reasoning [J].
Johnson, Justin ;
Hariharan, Bharath ;
van der Maaten, Laurens ;
Hoffman, Judy ;
Li Fei-Fei ;
Zitnick, C. Lawrence ;
Girshick, Ross .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3008-3017
[9]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755
[10]   Generating Diverse and Meaningful Captions Unsupervised Specificity Optimization for Image Captioning [J].
Lindh, Annika ;
Ross, Robert J. ;
Mahalunkar, Abhijit ;
Salton, Giancarlo ;
Kelleher, John D. .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 :176-187