Referring expression comprehension model with matching detection and linguistic feedback

被引：0

作者：

Wang, Jianming ^{[1
,2
]}

Cui, Enjie ^{[3
]}

Liu, Kunliang ^{[1
]}

Sun, Yukuan ^{[3
]}

Liang, Jiayu ^{[1
]}

Yuan, Chunmiao ^{[1
]}

Duan, Xiaojie ^{[3
]}

Jin, Guanghao ^{[1
,4
]}

Chung, Tae-Sun ^{[5
]}

机构：

[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China

[2] Tiangong Univ, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin, Peoples R China

[3] Tiangong Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China

[4] Tiangong Univ, Tianjin Int Joint Res & Dev Ctr Autonomous Intell, Tianjin, Peoples R China

[5] Ajou Univ, Dept Comp Engn, Suwon 16499, South Korea

来源：

IET COMPUTER VISION | 2020年 / 14卷 / 08期

基金：

新加坡国家研究基金会; 中国国家自然科学基金;

关键词：

SEGMENTATION; RECOGNITION; FEATURES; TEXTURE;

D O I：

10.1049/iet-cvi.2019.0483

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The task of referring expression comprehension (REC) is to localise an image region of a specific object described by a natural language expression, and all existing REC methods assume that the object described by the referring expression must be located in the given image. However, this assumption is not correct in some real applications. For example, a visually impaired user might tell his robot 'please take the laptop on the table to me'. In fact, the laptop is not on the table anymore. To address this problem, the authors propose a novel REC model to deal with the situation where expression-image mismatching occurs and explain the mismatching by linguistic feedback. The authors' REC model consists of four modules: the expression parsing module, the entity detection module, the relationship detection module, and the matching detection module. They built a data set called NP-RefCOCO+ from RefCOCO+ including both positive samples and negative samples. The positive samples are original expression-image pairs in RefCOCO+. The negative samples are the expression-image pairs in RefCOCO+, whose expressions are replaced. They evaluate the model on NP-RefCOCO+ and the experimental results show the advantages of their method for dealing with the problem of expression-image mismatching.

引用

页码：625 / 633

页数：9

共 46 条

[1] Hybrid robust iris recognition approach using iris image pre-processing, two-dimensional gabor features and multi-layer perceptron neural network/PSO
Ahmadi, Neda
Akbarizadeh, Gholamreza
[J]. IET BIOMETRICS, 2018, 7 (02) : 153 - 162
[2] Efficient Combination of Texture and Color Features in a New Spectral Clustering Method for PolSAR Image Segmentation
Akbarizadeh, Gholamreza
Rahmani, Masoumeh
[J]. NATIONAL ACADEMY SCIENCE LETTERS-INDIA, 2017, 40 (02): : 117 - 120
[3] A New Statistical-Based Kurtosis Wavelet Energy Feature for Texture Recognition of SAR Images
Akbarizadeh, Gholamreza
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2012, 50 (11): : 4358 - 4368
[4] Akbarizadeh Gholamreza, 2013, J REMOTE SENS TECHNO, V1, P44, DOI DOI 10.18005/JRST0102003
[5] SPICE: Semantic Propositional Image Caption Evaluation
Anderson, Peter
Fernando, Basura
Johnson, Mark
Gould, Stephen
[J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
[6] [Anonymous], PROC CVPR IEEE
[7] VQA: Visual Question Answering
Antol, Stanislaw
Agrawal, Aishwarya
Lu, Jiasen
Mitchell, Margaret
Batra, Dhruv
Zitnick, C. Lawrence
Parikh, Devi
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
[8] Multiscale Combinatorial Grouping
Arbelaez, Pablo
Pont-Tuset, Jordi
Barron, Jonathan T.
Marques, Ferran
Malik, Jitendra
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 328 - 335
[9] Bansal M., 2016, P 2016 C EMP METH NA
[10] Bordes A., 2013, P ANN C NEUR INF PRO, P2787, DOI DOI 10.5555/2999792.2999923

← 1 2 3 4 5 →