Flexible Visual Grounding

被引:0
作者
Kim, Yongmin [1 ]
Chu, Chenhui [1 ]
Kurohashi, Sadao [1 ]
机构
[1] Kyoto Univ, Kyoto, Japan
来源
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing visual grounding datasets are artificially made, where every query regarding an entity must be able to be grounded to a corresponding image region, i.e., answerable. However, in real-world multimedia data such as news articles and social media, many entities in the text cannot be grounded to the image, i.e., unanswerable, due to the fact that the text is unnecessarily directly describing the accompanying image. A robust visual grounding model should be able to flexibly deal with both answerable and unanswerable visual grounding. To study this flexible visual grounding problem, we construct a pseudo dataset and a social media dataset including both answerable and unanswerable queries. In order to handle unanswerable visual grounding, we propose a novel method by adding a pseudo image region corresponding to a query that cannot be grounded. The model is then trained to ground to ground-truth regions for answerable queries and pseudo regions for unanswerable queries. In our experiments, we show that our model can flexibly process both answerable and unanswerable queries with high accuracy on our datasets.(1)
引用
收藏
页码:285 / 299
页数:15
相关论文
共 50 条
  • [31] Grounding Visual Representations with Texts for Domain Generalization
    Min, Seonwoo
    Park, Nokyung
    Kim, Siwon
    Park, Seunghyun
    Kim, Jinkyu
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 37 - 53
  • [32] Detecting and Grounding Important Characters in Visual Stories
    Liu, Danyang
    Keller, Frank
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13210 - 13218
  • [33] Learning Comprehensive Visual Grounding for Video Captioning
    Jiang, Wenhui
    Liu, Linxin
    Fang, Yuming
    Cheng, Yibo
    Peng, Yuxin
    Liu, Yang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 3355 - 3367
  • [34] A Better Loss for Visual-Textual Grounding
    Rigoni, Davide
    Serafini, Luciano
    Sperduti, Alessandro
    37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 49 - 57
  • [35] GROOViST: A Metric for Grounding Objects in Visual Storytelling
    Surikuchi, Aditya K.
    Pezzelle, Sandro
    Fernandez, Raquel
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3331 - 3339
  • [36] Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
    Yang, Li
    Xu, Yan
    Yuan, Chunfeng
    Liu, Wei
    Li, Bing
    Hu, Weiming
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9489 - 9498
  • [37] Learning to Follow Verbal Instructions with Visual Grounding
    Unal, Emre
    Can, Ozan Arkan
    Yemez, Yucel
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [38] Visual Grounding Annotation of Recipe Flow Graph
    Nishimura, Taichi
    Tomori, Suzushi
    Hashimoto, Hayato
    Hashimoto, Atsushi
    Yamakata, Yoko
    Harashima, Jun
    Ushiku, Yoshitaka
    Mori, Shinsuke
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4275 - 4284
  • [39] INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
    Zhang, Hanbo
    Lu, Yunfan
    Yu, Cunjun
    Hsu, David
    Lan, Xuguang
    Zheng, Nanning
    ROBOTICS: SCIENCE AND SYSTEM XVII, 2021,
  • [40] Grounding Language Models for Visual Entity Recognition
    Xiao, Zilin
    Gong, Ming
    Cascante-Bonilla, Paola
    Zhang, Xingyao
    Wu, Jie
    Ordonez, Vicente
    COMPUTER VISION - ECCV 2024, PT XI, 2025, 15069 : 393 - 411