Hybrid Graph Reasoning With Dynamic Interaction for Visual Dialog

被引:1
作者
Du, Shanshan [1 ,2 ]
Wang, Hanli [1 ,2 ]
Li, Tengpeng [1 ,2 ]
Chen, Chang Wen [3 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Tongji Univ, Serv Comp, Minist Educ, Key Lab Embedded Syst, Shanghai 200092, Peoples R China
[3] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Cognition; Semantics; Task analysis; Routing; History; Transformers; Cross-modal interaction; dynamic routing; graph neural network; graph reasoning; visual dialog;
D O I
10.1109/TMM.2024.3385997
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a pivotal branch of intelligent human-computer interaction, visual dialog is a technically challenging task that requires artificial intelligence (AI) agents to answer consecutive questions based on image content and history dialog. Despite considerable progresses, visual dialog still suffers from two major problems: (1) how to design flexible cross-modal interaction patterns instead of over-reliance on expert experience and (2) how to infer underlying semantic dependencies between dialogues effectively. To address these issues, an end-to-end framework employing dynamic interaction and hybrid graph reasoning is proposed in this work. Specifically, three major components are designed and the practical benefits are demonstrated by extensive experiments. First, a dynamic interaction module is developed to automatically determine the optimal modality interaction route for multifarious questions, which consists of three elaborate functional interaction blocks endowed with dynamic routers. Second, a hybrid graph reasoning module is designed to explore adequate semantic associations between dialogues from multiple perspectives, where the hybrid graph is constructed by aggregating a structured coreference graph and a context-aware temporal graph. Third, a unified one-stage visual dialog model with an end-to-end structure is developed to train the dynamic interaction module and the hybrid graph reasoning module in a collaborative manner. Extensive experiments on the benchmark datasets of VisDial v0.9 and VisDial v1.0 demonstrate the effectiveness of the proposed method compared to other state-of-the-art approaches.
引用
收藏
页码:9095 / 9108
页数:14
相关论文
共 50 条
  • [31] Visual Dialog
    Das, Abhishek
    Kottur, Satwik
    Gupta, Khushi
    Singh, Avi
    Yadav, Deshraj
    Lee, Stefan
    Moura, Jose M. F.
    Parikh, Devi
    Batra, Dhruv
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (05) : 1242 - 1256
  • [32] Agent-Centric Relation Graph for Object Visual Navigation
    Hu, Xiaobo
    Lin, Youfang
    Wang, Shuo
    Wu, Zhihao
    Lv, Kai
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 1295 - 1309
  • [33] Learning Dynamic and Static Representations for Extrapolation-Based Temporal Knowledge Graph Reasoning
    Li, Pengfei
    Zhou, Guangyou
    Xie, Zhiwen
    Xie, Penghui
    Huang, Jimmy Xiangji
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4741 - 4754
  • [34] Extended Research on the Security of Visual Reasoning CAPTCHA
    Wang, Ping
    Gao, Haichang
    Xiao, Chenxuan
    Guo, Xiaoyan
    Gao, Yipeng
    Zi, Yang
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (06) : 4976 - 4992
  • [35] Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer
    Lin, Liang
    Gao, Yiming
    Gong, Ke
    Wang, Meng
    Liang, Xiaodan
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (05) : 2504 - 2518
  • [36] Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid
    Gao, Yiming
    Kuang, Zhanghui
    Li, Guanbin
    Luo, Ping
    Chen, Yimin
    Lin, Liang
    Zhang, Wayne
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7019 - 7034
  • [37] Tensor Graph Attention Network for Knowledge Reasoning in Internet of Things
    Yang, Jing
    Yang, Laurence T.
    Wang, Hao
    Gao, Yuan
    Liu, Huazhong
    Xie, Xia
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (12) : 9128 - 9137
  • [38] AGRNet: Adaptive Graph Representation Learning and Reasoning for Face Parsing
    Te, Gusi
    Hu, Wei
    Liu, Yinglu
    Shi, Hailin
    Mei, Tao
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 8236 - 8250
  • [39] Instance-Level Relative Saliency Ranking With Graph Reasoning
    Liu, Nian
    Li, Long
    Zhao, Wangbo
    Han, Junwei
    Shao, Ling
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (11) : 8321 - 8337
  • [40] HoGRN: Explainable Sparse Knowledge Graph Completion via High-Order Graph Reasoning Network
    Chen, Weijian
    Cao, Yixin
    Feng, Fuli
    He, Xiangnan
    Zhang, Yongdong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 8462 - 8475