Flexible Visual Grounding

被引:0
作者
Kim, Yongmin [1 ]
Chu, Chenhui [1 ]
Kurohashi, Sadao [1 ]
机构
[1] Kyoto Univ, Kyoto, Japan
来源
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing visual grounding datasets are artificially made, where every query regarding an entity must be able to be grounded to a corresponding image region, i.e., answerable. However, in real-world multimedia data such as news articles and social media, many entities in the text cannot be grounded to the image, i.e., unanswerable, due to the fact that the text is unnecessarily directly describing the accompanying image. A robust visual grounding model should be able to flexibly deal with both answerable and unanswerable visual grounding. To study this flexible visual grounding problem, we construct a pseudo dataset and a social media dataset including both answerable and unanswerable queries. In order to handle unanswerable visual grounding, we propose a novel method by adding a pseudo image region corresponding to a query that cannot be grounded. The model is then trained to ground to ground-truth regions for answerable queries and pseudo regions for unanswerable queries. In our experiments, we show that our model can flexibly process both answerable and unanswerable queries with high accuracy on our datasets.(1)
引用
收藏
页码:285 / 299
页数:15
相关论文
共 50 条
  • [21] Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation
    Chen, Feilong
    Meng, Fandong
    Chen, Xiuyi
    Li, Peng
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 436 - 446
  • [22] Flexible PCB Grounding Connections for Hybrid Systems
    Cracraft, Michael
    Connor, Samuel
    Archambeault, Bruce
    2013 IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY (EMC), 2013, : 466 - 471
  • [23] INGRESS: Interactive visual grounding of referring expressions
    Shridhar, Mohit
    Mittal, Dixant
    Hsu, David
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (2-3) : 217 - 232
  • [24] Countering Language Drift via Visual Grounding
    Lee, Jason
    Cho, Kyunghyun
    Kiela, Douwe
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4385 - 4395
  • [25] Measuring Faithful and Plausible Visual Grounding in VQA
    Reich, Daniel
    Putze, Felix
    Schultz, Tanja
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3129 - 3144
  • [26] Grounding visual sociology research in shooting scripts
    Suchar C.S.
    Qualitative Sociology, 1997, 20 (1) : 33 - 55
  • [27] Parallel Vertex Diffusion for Unified Visual Grounding
    Cheng, Zesen
    Li, Kehan
    Jin, Peng
    Li, Siheng
    Ji, Xiangyang
    Yuan, Li
    Liu, Chang
    Chen, Jie
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1326 - 1334
  • [28] Visual Grounding With Joint Multimodal Representation and Interaction
    Zhu, Hong
    Lu, Qingyang
    Xue, Lei
    Xue, Mogen
    Yuan, Guanglin
    Zhong, Bineng
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72 : 1 - 11
  • [29] Towards Unified Interactive Visual Grounding in The Wild
    Xu, Jie
    Zhang, Hanbo
    Shi, Qingyi
    Liu, Yifeng
    Lan, Xuguang
    Kong, Tao
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 3288 - 3295
  • [30] INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
    Zhang, Hanbo
    Lu, Yunfan
    Yu, Cunjun
    Hsu, David
    Lan, Xuguang
    Zheng, Nanning
    arXiv, 2021,