Hybrid Graph Reasoning With Dynamic Interaction for Visual Dialog

被引:1
作者
Du, Shanshan [1 ,2 ]
Wang, Hanli [1 ,2 ]
Li, Tengpeng [1 ,2 ]
Chen, Chang Wen [3 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Tongji Univ, Serv Comp, Minist Educ, Key Lab Embedded Syst, Shanghai 200092, Peoples R China
[3] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Cognition; Semantics; Task analysis; Routing; History; Transformers; Cross-modal interaction; dynamic routing; graph neural network; graph reasoning; visual dialog;
D O I
10.1109/TMM.2024.3385997
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a pivotal branch of intelligent human-computer interaction, visual dialog is a technically challenging task that requires artificial intelligence (AI) agents to answer consecutive questions based on image content and history dialog. Despite considerable progresses, visual dialog still suffers from two major problems: (1) how to design flexible cross-modal interaction patterns instead of over-reliance on expert experience and (2) how to infer underlying semantic dependencies between dialogues effectively. To address these issues, an end-to-end framework employing dynamic interaction and hybrid graph reasoning is proposed in this work. Specifically, three major components are designed and the practical benefits are demonstrated by extensive experiments. First, a dynamic interaction module is developed to automatically determine the optimal modality interaction route for multifarious questions, which consists of three elaborate functional interaction blocks endowed with dynamic routers. Second, a hybrid graph reasoning module is designed to explore adequate semantic associations between dialogues from multiple perspectives, where the hybrid graph is constructed by aggregating a structured coreference graph and a context-aware temporal graph. Third, a unified one-stage visual dialog model with an end-to-end structure is developed to train the dynamic interaction module and the hybrid graph reasoning module in a collaborative manner. Extensive experiments on the benchmark datasets of VisDial v0.9 and VisDial v1.0 demonstrate the effectiveness of the proposed method compared to other state-of-the-art approaches.
引用
收藏
页码:9095 / 9108
页数:14
相关论文
共 50 条
  • [21] Interactive Visual Pattern Search on Graph Data via Graph Representation Learning
    Song, Huan
    Dai, Zeng
    Xu, Panpan
    Ren, Liu
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (01) : 335 - 345
  • [22] Bilinear Graph Networks for Visual Question Answering
    Guo, Dalu
    Xu, Chang
    Tao, Dacheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1023 - 1034
  • [23] Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval
    Yu, Jing
    Zhang, Weifeng
    Lu, Yuhang
    Qin, Zengchang
    Hu, Yue
    Tan, Jianlong
    Wu, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3196 - 3209
  • [24] Hierarchical Graph Interaction Transformer With Dynamic Token Clustering for Camouflaged Object Detection
    Yao, Siyuan
    Sun, Hao
    Xiang, Tian-Zhu
    Wang, Xiao
    Cao, Xiaochun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5936 - 5948
  • [25] DGIG-Net: Dynamic Graph-in-Graph Networks for Few-Shot Human-Object Interaction
    Liu, Xiyao
    Ji, Zhong
    Pang, Yanwei
    Han, Jungong
    Li, Xuelong
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) : 7852 - 7864
  • [26] Graph-Based Spatial Reasoning for Tracking Landmarks in Dynamic Laparoscopic Environments
    Zhang, Jie
    Wang, Yiwei
    Zhou, Song
    Zhao, Huan
    Wan, Chidan
    Cai, Xiong
    Ding, Han
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8459 - 8466
  • [27] Explicit Cross-Modal Representation Learning for Visual Commonsense Reasoning
    Zhang, Xi
    Zhang, Feifei
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2986 - 2997
  • [28] Hierarchical Reasoning Network for Human-Object Interaction Detection
    Gao, Yiming
    Kuang, Zhanghui
    Li, Guanbin
    Zhang, Wayne
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 8306 - 8317
  • [29] A Spatial Hierarchical Reasoning Network for Remote Sensing Visual Question Answering
    Zhang, Zixiao
    Jiao, Licheng
    Li, Lingling
    Liu, Xu
    Chen, Puhua
    Liu, Fang
    Li, Yuxuan
    Guo, Zhicheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [30] Visual Reasoning for Uncertainty in Spatio-Temporal Events of Historical Figures
    Zhang, Wei
    Tan, Siwei
    Chen, Siming
    Meng, Linghao
    Zhang, Tianye
    Zhu, Rongchen
    Chen, Wei
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (06) : 3009 - 3023