Visual Relationship Embedding Network for Image Paragraph Generation

被引:14
|
作者
Che, Wenbin [1 ,2 ]
Fan, Xiaopeng [1 ,2 ]
Xiong, Ruiqin [3 ]
Zhao, Debin [1 ,2 ]
机构
[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Intera, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
[2] PengCheng Lab, Shenzhen 518055, Peoples R China
[3] Peking Univ, Inst Digital Media, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
基金
美国国家科学基金会;
关键词
Visualization; Semantics; Task analysis; Proposals; Automobiles; Buildings; Gallium nitride; Paragraph generation; image caption; region localization; attention network; visual relationship; GAN; LSTM; LANGUAGE;
D O I
10.1109/TMM.2019.2954750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.
引用
收藏
页码:2307 / 2320
页数:14
相关论文
共 50 条
  • [1] Paragraph Generation Network with Visual Relationship Detection
    Che, Wenbin
    Fan, Xiaopeng
    Xiong, Ruiqin
    Zhao, Debin
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1435 - 1443
  • [2] Comprehensive Relation Modelling for Image Paragraph Generation
    Zhu, Xianglu
    Zhang, Zhang
    Wang, Wei
    Wang, Zilei
    MACHINE INTELLIGENCE RESEARCH, 2024, 21 (02) : 369 - 382
  • [3] Network Embedding With Dual Generation Tasks
    Li, Na
    Liu, Jie
    He, Zhicheng
    Zhang, Chunhai
    Xie, Jiaying
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 7303 - 7315
  • [4] Regularizing Visual Semantic Embedding With Contrastive Learning for Image-Text Matching
    Liu, Yang
    Liu, Hong
    Wang, Huaqiu
    Liu, Mengyuan
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1332 - 1336
  • [5] Bypass network for semantics driven image paragraph captioning
    Zheng, Qi
    Wang, Chaoyue
    Wang, Dadong
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [6] Radial Graph Convolutional Network for Visual Question Generation
    Xu, Xing
    Wang, Tan
    Yang, Yang
    Hanjalic, Alan
    Shen, Heng Tao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (04) : 1654 - 1667
  • [7] VSAM-Based Visual Keyword Generation for Image Caption
    Zhang, Suya
    Zhang, Yana
    Chen, Zeyu
    Li, Zhaohui
    IEEE ACCESS, 2021, 9 : 27638 - 27649
  • [8] Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning
    Guo, Dandan
    Lu, Ruiying
    Chen, Bo
    Zeng, Zequn
    Zhou, Mingyuan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (08) : 1920 - 1937
  • [9] DFLLR: Deep Feature Learning With Latent Relationship Embedding for Remote Sensing Image Retrieval
    Liu, Li
    Wang, Yuebin
    Peng, Junhuan
    Plaza, Antonio
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [10] Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation
    Feng, Guang
    Hu, Zhiwei
    Zhang, Lihe
    Sun, Jiayu
    Lu, Huchuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2246 - 2258