Radial Graph Convolutional Network for Visual Question Generation

被引:42
|
作者
Xu, Xing [1 ,2 ]
Wang, Tan [1 ,2 ]
Yang, Yang [1 ,2 ]
Hanjalic, Alan [3 ]
Shen, Heng Tao [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Multimedia, Chengdu 610051, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610051, Peoples R China
[3] Delft Univ Technol, Sch Informat & Software Engn, NL-2628 CD Delft, Netherlands
基金
中国国家自然科学基金;
关键词
Task analysis; Visualization; Training; Data models; Semantics; Convolution; Cross-media understanding; graph convolutional network (GCN); visual question generation (VQG);
D O I
10.1109/TNNLS.2020.2986029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we address the problem of visual question generation (VQG), a challenge in which a computer is required to generate meaningful questions about an image targeting a given answer. The existing approaches typically treat the VQG task as a reversed visual question answer (VQA) task, requiring the exhaustive match among all the image regions and the given answer. To reduce the complexity, we propose an innovative answer-centric approach termed radial graph convolutional network (Radial-GCN) to focus on the relevant image regions only. Our Radial-GCN method can quickly find the core answer area in an image by matching the latent answer with the semantic labels learned from all image regions. Then, a novel sparse graph of the radial structure is naturally built to capture the associations between the core node (i.e., answer area) and peripheral nodes (i.e., other areas); the graphic attention is subsequently adopted to steer the convolutional propagation toward potentially more relevant nodes for final question generation. Extensive experiments on three benchmark data sets show the superiority of our approach compared with the reference methods. Even in the unexplored challenging zero-shot VQA task, the synthesized questions by our method remarkably boost the performance of several state-of-the-art VQA methods from 0% to over 40%. The implementation code of our proposed method and the successfully generated questions are available at https://github.com/Wangt-CN/VQG-GCN.
引用
收藏
页码:1654 / 1667
页数:14
相关论文
共 50 条
  • [21] ADGCN: An Asynchronous Dilation Graph Convolutional Network for Traffic Flow Prediction
    Qi, Tao
    Li, Guanghui
    Chen, Lingqiang
    Xue, Yanming
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (05) : 4001 - 4014
  • [22] Latent Attention Network With Position Perception for Visual Question Answering
    Zhang, Jing
    Liu, Xiaoqiang
    Wang, Zhe
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 11
  • [23] Topological Graph Convolutional Network Based on Complex Network Characteristics
    Gao, He
    Yu, Xiang
    Sui, Yi
    Shao, Fengjing
    Sun, Rencheng
    IEEE ACCESS, 2022, 10 : 64465 - 64472
  • [24] A Convolutional Neural Network and Graph Convolutional Network Based Framework for Classification of Breast Histopathological Images
    Gao, Zhiyang
    Lu, Zhiyang
    Wang, Jun
    Ying, Shihui
    Shi, Jun
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (07) : 3163 - 3173
  • [25] Knowledge-Based Visual Question Generation
    Xie, Jiayuan
    Fang, Wenhao
    Cai, Yi
    Huang, Qingbao
    Li, Qing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7547 - 7558
  • [26] Multitask Learning for Visual Question Answering
    Ma, Jie
    Liu, Jun
    Lin, Qika
    Wu, Bei
    Wang, Yaxian
    You, Yang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1380 - 1394
  • [27] Graph-Based Multi-Interaction Network for Video Question Answering
    Gu, Mao
    Zhao, Zhou
    Jin, Weike
    Hong, Richang
    Wu, Fei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2758 - 2770
  • [28] Joint Graph Attention and Asymmetric Convolutional Neural Network for Deep Image Compression
    Tang, Zhisen
    Wang, Hanli
    Yi, Xiaokai
    Zhang, Yun
    Kwong, Sam
    Kuo, C. -C. Jay
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (01) : 421 - 433
  • [29] Recursive Multi-Relational Graph Convolutional Network for Automatic Photo Selection
    Xu, Wujiang
    Xu, Yifei
    Sang, Genan
    Li, Li
    Wang, Aichen
    Wei, Pingping
    Zhu, Li
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3825 - 3840
  • [30] Adaptive Semantic-Spatio-Temporal Graph Convolutional Network for Lip Reading
    Sheng, Changchong
    Zhu, Xinzhong
    Xu, Huiying
    Pietikainen, Matti
    Liu, Li
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3545 - 3557