ReverseGAN: An intelligent reverse generative adversarial networks system for complex image captioning generation

被引:3
作者
Tong, Guoxiang [1 ]
Shao, Wei [1 ]
Li, Yueyang [1 ]
机构
[1] Univ Shanghai Sci & Technol, Coll Opt Elect & Comp Engn, Shanghai 200093, Peoples R China
关键词
Image caption; Generative adversarial network; Text-to-image; Attention mechanism; Semantic consistency;
D O I
10.1016/j.displa.2024.102653
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Towards the inclusion of complex semantic relational images, we propose an intelligent Reverse Generative Adversarial Network (ReverseGAN) with generative task guidance to build an image caption system. The system utilizes regenerated images to learn the concept of image caption generation, using a generative adversarial network as the overall framework of the model. The generative network uses a graph convolutional neural network to encode the images and constructs a decoder model that converts image vectors into captions. The reverse text -to-image task serves as a discriminator model. The discriminator uses a text embedding module to map the text descriptions generated by the generator to local word-level features and global sentence features. In addition, the introduced cascading attention module utilizes the generated embedding vectors to generate images on a coarse-to-fine scale, incorporating both global and local features of the text to ensure that the generated images are closer to the original images. Our model outperforms current state -of -the -art methods in BLEU, METEOR, and ROUGE metrics when experimented on the MSCOCO dataset.
引用
收藏
页数:16
相关论文
共 69 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]   Boosting convolutional image captioning with semantic content and visual relationship [J].
Bai, Cong ;
Zheng, Anqi ;
Huang, Yuan ;
Pan, Xiang ;
Chen, Nan .
DISPLAYS, 2021, 70
[3]   Subjective and Objective Audio-Visual Quality Assessment for User Generated Content [J].
Cao, Yuqin ;
Min, Xiongkuo ;
Sun, Wei ;
Zhai, Guangtao .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 :3847-3861
[4]   Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment [J].
Cao, Yuqin ;
Min, Xiongkuo ;
Sun, Wei ;
Zhai, Guangtao .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 :1882-1896
[5]  
Chen C, 2019, AAAI CONF ARTIF INTE, P8142
[6]   Relational-Convergent Transformer for image captioning [J].
Chen, Lizhi ;
Yang, You ;
Hu, Juntao ;
Pan, Longyue ;
Zhai, Hao .
DISPLAYS, 2023, 77
[7]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306
[8]   Improving image captioning with Pyramid Attention and SC-GAN [J].
Chen, Tianyu ;
Li, Zhixin ;
Wu, Jingli ;
Ma, Huifang ;
Su, Bianping .
IMAGE AND VISION COMPUTING, 2022, 117
[9]   Towards Diverse and Natural Image Descriptions via a Conditional GAN [J].
Dai, Bo ;
Fidler, Sanja ;
Urtasun, Raquel ;
Lin, Dahua .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2989-2998
[10]   Detecting Visual Relationships with Deep Relational Networks [J].
Dai, Bo ;
Zhang, Yuqi ;
Lin, Dahua .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3298-3308