Selective Comprehension for Referring Expression by Prebuilt Entity Dictionary with Modular Networks

被引:1
作者
Cui, Enjie [1 ]
Wang, Jianming [1 ,2 ]
Liang, Jiayu [2 ]
Jin, Guanghao [2 ]
机构
[1] Tianjin Polytech Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[2] Tianjin Polytech Univ, Sch Comp Sci & Software Engn, Tianjin, Peoples R China
来源
KNOWLEDGE MANAGEMENT AND ACQUISITION FOR INTELLIGENT SYSTEMS (PKAW 2018) | 2018年 / 11016卷
基金
中国国家自然科学基金;
关键词
Referring expression; Selective comprehension; Entity dictionary; Modular networks;
D O I
10.1007/978-3-319-97289-3_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring expression comprehension, known as the technique of localizing entities in an image based on natural language expression, is still a challenging task far from solved. In literature, researchers always focused on how to localize the correct image region according to a natural language expression and never questioned the correctness of the expression. In practical scenarios, the situation is common For example, there is a pumpkin on the table, but the expression is "there is a watermelon on the table". It is obvious that incorrect location can be derived from a wrong expression, which state-of-the-art approaches cannot avoid. In this paper, we propose modular networks to solve this problem, which includes three main parts, i.e. the expression filtering module, the expression analysis module and the localization module. Specifically, the expression filtering module adopts an entity dictionary to list all the objects in the image, which is prebuilt by an object detection method, to discriminate whether an expression is correct or not. In this way, our model realizes selective comprehension of referring expression, which can output a "wrong expression" feedback instead of a wrong image region localization when an expression is determined as wrong. Sufficient experiments shows that our model can efficiently filter wrong expressions and effectively solve the problem of referring expression compression in practical scenarios.
引用
收藏
页码:211 / 220
页数:10
相关论文
共 18 条
[1]  
[Anonymous], 2014, arXiv
[2]  
Dai J, 2016, PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), P1796, DOI 10.1109/ICIT.2016.7475036
[3]   Modeling Relationships in Referential Expressions with Compositional Modular Networks [J].
Hu, Ronghang ;
Rohrbach, Marcus ;
Andreas, Jacob ;
Darrell, Trevor ;
Saenko, Kate .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4418-4427
[4]   Natural Language Object Retrieval [J].
Hu, Ronghang ;
Xu, Huazhe ;
Rohrbach, Marcus ;
Feng, Jiashi ;
Saenko, Kate ;
Darrell, Trevor .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4555-4564
[5]  
Krähenbühl P, 2014, LECT NOTES COMPUT SC, V8693, P725, DOI 10.1007/978-3-319-10602-1_47
[6]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755
[7]   Referring Expression Generation and Comprehension via Attributes [J].
Liu, Jingyu ;
Wang, Liang ;
Yang, Ming-Hsuan .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4866-4874
[8]   Comprehension-guided referring expressions [J].
Luo, Ruotian ;
Shakhnarovich, Gregory .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3125-3134
[9]   Generation and Comprehension of Unambiguous Object Descriptions [J].
Mao, Junhua ;
Huang, Jonathan ;
Toshev, Alexander ;
Camburu, Oana ;
Yuille, Alan ;
Murphy, Kevin .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :11-20
[10]   Modeling Context Between Objects for Referring Expression Understanding [J].
Nagaraja, Varun K. ;
Morariu, Vlad I. ;
Davis, Larry S. .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :792-807