Flexible Visual Grounding

被引：0

作者：

Kim, Yongmin ^{[1
]}

Chu, Chenhui ^{[1
]}

Kurohashi, Sadao ^{[1
]}

机构：

[1] Kyoto Univ, Kyoto, Japan

来源：

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing visual grounding datasets are artificially made, where every query regarding an entity must be able to be grounded to a corresponding image region, i.e., answerable. However, in real-world multimedia data such as news articles and social media, many entities in the text cannot be grounded to the image, i.e., unanswerable, due to the fact that the text is unnecessarily directly describing the accompanying image. A robust visual grounding model should be able to flexibly deal with both answerable and unanswerable visual grounding. To study this flexible visual grounding problem, we construct a pseudo dataset and a social media dataset including both answerable and unanswerable queries. In order to handle unanswerable visual grounding, we propose a novel method by adding a pseudo image region corresponding to a query that cannot be grounded. The model is then trained to ground to ground-truth regions for answerable queries and pseudo regions for unanswerable queries. In our experiments, we show that our model can flexibly process both answerable and unanswerable queries with high accuracy on our datasets.(1)

引用

页码：285 / 299

页数：15

共 50 条

[1] VISUAL GROUNDING
CUMBOW, RC
AMERICAN FILM, 1978, 3 (10): : 16 - 16
[2] Deconfounded Visual Grounding
Huang, Jianqiang
Qin, Yu
Qi, Jiaxin
Sun, Qianru
Zhang, Hanwang
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 998 - 1006
[3] Grounding Visual Explanations
Hendricks, Lisa Anne
Hu, Ronghang
Darrell, Trevor
Akata, Zeynep
COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 269 - 286
[4] Concepts require flexible grounding
Dove, Guy
BRAIN AND LANGUAGE, 2023, 245
[5] Gaze Assisted Visual Grounding
Johari, Kritika
Tong, Christopher Tay Zi
Subbaraju, Vigneshwaran
Kim, Jung-Jae
Tan, U-Xuan
SOCIAL ROBOTICS, ICSR 2021, 2021, 13086 : 191 - 202
[6] Visual-Semantic Graph Matching for Visual Grounding
Jing, Chenchen
Wu, Yuwei
Pei, Mingtao
Hu, Yao
Jia, Yunde
Wu, Qi
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4041 - 4050
[7] Cross-Lingual Visual Grounding
Dong, Wenjian
Otani, Mayu
Garcia, Noa
Nakashima, Yuta
Chu, Chenhui
IEEE ACCESS, 2021, 9 : 349 - 358
[8] Grounding Language in Visual and Conversational Contexts
Fernandez, Raquel
WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 366 - 366
[9] Visual Grounding With Dual Knowledge Distillation
Wu, Wansen
Cao, Meng
Hu, Yue
Peng, Yong
Qin, Long
Yin, Quanjun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 10399 - 10410
[10] Visual Grounding of Learned Physical Models
Li, Yunzhu
Lin, Toru
Yi, Kexin
Bear, Daniel M.
Yamins, Daniel L. K.
Wu, Jiajun
Tenenbaum, Joshua B.
Torralba, Antonio
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119

← 1 2 3 4 5 →