Visual Dialog with Multi-turn Attentional Memory Network

被引:2
作者
Kong, Dejiang [1 ]
Wu, Fei [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
来源
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I | 2018年 / 11164卷
基金
中国国家自然科学基金;
关键词
Visual dialog; Memory network; Multi-turn attention;
D O I
10.1007/978-3-030-00776-8_56
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual dialog is a task of answering a question given an input image, a historical dialog about the image and often requires to retrieve visual and textual facts about the question. This problem is different from visual question answering (VQA), which only relies on visual grounding estimated from an image and question pair, while visual dialog task requires interactions among a question, an input image and a historical dialog. Most methods rely on one-turn attention network to obtain facts w.r.t. a question. However, the information transition phenomenon which exists in these facts restricts these methods to retrieve all relevant information. In this paper, we propose a multi-turn attentional memory network for visual dialog. Firstly, we propose a attentional memory network that maintains image regions and historical dialog in two memory banks and attends the question to be answered to both the visual and textual banks to obtain multi-model facts. Further, considering the information transition phenomenon, we design a multi-turn attention architecture which attend to memory banks multiple turns to retrieve more facts in order to produce a better answer. We evaluate the proposed model in on VisDial v0.9 dataset and the experimental results prove the effectiveness of the proposed model.
引用
收藏
页码:611 / 621
页数:11
相关论文
共 50 条
  • [41] Memory network with hierarchical multi-head attention for aspect-based sentiment analysis
    Yuzhong Chen
    Tianhao Zhuang
    Kun Guo
    Applied Intelligence, 2021, 51 : 4287 - 4304
  • [42] Memory network with hierarchical multi-head attention for aspect-based sentiment analysis
    Chen, Yuzhong
    Zhuang, Tianhao
    Guo, Kun
    APPLIED INTELLIGENCE, 2021, 51 (07) : 4287 - 4304
  • [43] Multi-modal Knowledge-aware Event Memory Network for Social Media Rumor Detection
    Zhang, Huaiwen
    Fang, Quan
    Qian, Shengsheng
    Xu, Changsheng
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1942 - 1951
  • [44] Question-aware memory network for multi-hop question answering in human–robot interaction
    Xinmeng Li
    Mamoun Alazab
    Qian Li
    Keping Yu
    Quanjun Yin
    Complex & Intelligent Systems, 2022, 8 : 851 - 861
  • [45] An Attentive Memory Network Integrated with Aspect Dependency for Document-Level Multi-Aspect Sentiment Classification
    Zhang, Qingxuan
    Shi, Chongyang
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 425 - 440
  • [46] Question-aware memory network for multi-hop question answering in human-robot interaction
    Li, Xinmeng
    Alazab, Mamoun
    Li, Qian
    Yu, Keping
    Yin, Quanjun
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (02) : 851 - 861
  • [47] Multi-Granularity Position-Aware Convolutional Memory Network for Aspect-Based Sentiment Analysis
    Pan, Yuanyuan
    Gan, Jun
    Ran, Xiangying
    Wang, Chongjun
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 728 - 735
  • [48] M-GCN: Brain-inspired memory graph convolutional network for multi-label image recognition
    Xiao Yao
    Feiyang Xu
    Min Gu
    Peipei Wang
    Neural Computing and Applications, 2022, 34 : 6489 - 6502
  • [49] M-GCN: Brain-inspired memory graph convolutional network for multi-label image recognition
    Yao, Xiao
    Xu, Feiyang
    Gu, Min
    Wang, Peipei
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (08) : 6489 - 6502
  • [50] GCN-BERT and Memory Network Based Multi-Label Classification for Event Text of the Chinese Government Hotline
    Bin Liu
    IEEE ACCESS, 2022, 10 : 109267 - 109276