Hybrid Graph Reasoning With Dynamic Interaction for Visual Dialog

被引:1
|
作者
Du, Shanshan [1 ,2 ]
Wang, Hanli [1 ,2 ]
Li, Tengpeng [1 ,2 ]
Chen, Chang Wen [3 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Tongji Univ, Serv Comp, Minist Educ, Key Lab Embedded Syst, Shanghai 200092, Peoples R China
[3] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Cognition; Semantics; Task analysis; Routing; History; Transformers; Cross-modal interaction; dynamic routing; graph neural network; graph reasoning; visual dialog;
D O I
10.1109/TMM.2024.3385997
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a pivotal branch of intelligent human-computer interaction, visual dialog is a technically challenging task that requires artificial intelligence (AI) agents to answer consecutive questions based on image content and history dialog. Despite considerable progresses, visual dialog still suffers from two major problems: (1) how to design flexible cross-modal interaction patterns instead of over-reliance on expert experience and (2) how to infer underlying semantic dependencies between dialogues effectively. To address these issues, an end-to-end framework employing dynamic interaction and hybrid graph reasoning is proposed in this work. Specifically, three major components are designed and the practical benefits are demonstrated by extensive experiments. First, a dynamic interaction module is developed to automatically determine the optimal modality interaction route for multifarious questions, which consists of three elaborate functional interaction blocks endowed with dynamic routers. Second, a hybrid graph reasoning module is designed to explore adequate semantic associations between dialogues from multiple perspectives, where the hybrid graph is constructed by aggregating a structured coreference graph and a context-aware temporal graph. Third, a unified one-stage visual dialog model with an end-to-end structure is developed to train the dynamic interaction module and the hybrid graph reasoning module in a collaborative manner. Extensive experiments on the benchmark datasets of VisDial v0.9 and VisDial v1.0 demonstrate the effectiveness of the proposed method compared to other state-of-the-art approaches.
引用
收藏
页码:9095 / 9108
页数:14
相关论文
共 50 条
  • [1] Closed-loop reasoning with graph-aware dense interaction for visual dialog
    An-An Liu
    Guokai Zhang
    Ning Xu
    Junbo Guo
    Guoqing Jin
    Xuanya Li
    Multimedia Systems, 2022, 28 : 1823 - 1832
  • [2] Closed-loop reasoning with graph-aware dense interaction for visual dialog
    Liu, An-An
    Zhang, Guokai
    Xu, Ning
    Guo, Junbo
    Jin, Guoqing
    Li, Xuanya
    MULTIMEDIA SYSTEMS, 2022, 28 (05) : 1823 - 1832
  • [3] Temporal Knowledge Graph Reasoning With Dynamic Memory Enhancement
    Zhang, Fuwei
    Zhang, Zhao
    Zhuang, Fuzhen
    Zhao, Yu
    Wang, Deqing
    Zheng, Hongwei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 7115 - 7128
  • [4] Heterogeneous Knowledge Network for Visual Dialog
    Zhao, Lei
    Li, Junlin
    Gao, Lianli
    Rao, Yunbo
    Song, Jingkuan
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 861 - 871
  • [5] DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
    Wang, Jianyu
    Bao, Bing-Kun
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3369 - 3380
  • [6] Scenario-Transferable Semantic Graph Reasoning for Interaction-Aware Probabilistic Prediction
    Hu, Yeping
    Zhan, Wei
    Tomizuka, Masayoshi
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 23212 - 23230
  • [7] Hierarchical Multimodality Graph Reasoning for Remote Sensing Visual Question Answering
    Zhang, Han
    Wang, Keming
    Zhang, Laixian
    Wang, Bingshu
    Li, Xuelong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [8] Dynamic Spatio-Temporal Graph Reasoning for VideoQA With Self-Supervised Event Recognition
    Nie, Jie
    Wang, Xin
    Hou, Runze
    Li, Guohao
    Chen, Hong
    Zhu, Wenwu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4145 - 4158
  • [9] Visual-Textual Hybrid Sequence Matching for Joint Reasoning
    Huang, Xin
    Peng, Yuxin
    Wen, Zhang
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (12) : 5692 - 5705
  • [10] Context-Aware Graph Inference With Knowledge Distillation for Visual Dialog
    Guo, Dan
    Wang, Hui
    Wang, Meng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6056 - 6073