Visual Relationship Embedding Network for Image Paragraph Generation

被引:14
|
作者
Che, Wenbin [1 ,2 ]
Fan, Xiaopeng [1 ,2 ]
Xiong, Ruiqin [3 ]
Zhao, Debin [1 ,2 ]
机构
[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Intera, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
[2] PengCheng Lab, Shenzhen 518055, Peoples R China
[3] Peking Univ, Inst Digital Media, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
基金
美国国家科学基金会;
关键词
Visualization; Semantics; Task analysis; Proposals; Automobiles; Buildings; Gallium nitride; Paragraph generation; image caption; region localization; attention network; visual relationship; GAN; LSTM; LANGUAGE;
D O I
10.1109/TMM.2019.2954750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.
引用
收藏
页码:2307 / 2320
页数:14
相关论文
共 50 条
  • [31] Visual Cluster Grounding for Image Captioning
    Jiang, Wenhui
    Zhu, Minwei
    Fang, Yuming
    Shi, Guangming
    Zhao, Xiaowei
    Liu, Yang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3920 - 3934
  • [32] Label Generation System Based on Generative Adversarial Network for Medical Image
    Li, Jiyun
    Hong, Yongliang
    2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND PATTERN RECOGNITION (AIPR 2019), 2019, : 78 - 82
  • [33] Impact of Heterogeneity on Network Embedding
    Liang, Bo
    Wang, Xiaofan
    Wang, Lin
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (03): : 1296 - 1307
  • [34] Sampling Propagation Attention With Trimap Generation Network for Natural Image Matting
    Zhou, Yuhongze
    Zhou, Liguang
    Lam, Tin Lun
    Xu, Yangsheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5828 - 5843
  • [35] Embedding Attention and Residual Network for Accurate Salient Object Detection
    Chen, Shuhan
    Wang, Ben
    Tan, Xiuli
    Hu, Xuelong
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (05) : 2050 - 2062
  • [36] SkeletonNet: A Hybrid Network With a Skeleton-Embedding Process for Multi-View Image Representation Learning
    Yang, Shijie
    Li, Liang
    Wang, Shuhui
    Zhang, Weigang
    Huang, Qingming
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2916 - 2929
  • [37] VCRNet: Visual Compensation Restoration Network for No-Reference Image Quality Assessment
    Pan, Zhaoqing
    Yuan, Feng
    Lei, Jianjun
    Fang, Yuming
    Shao, Xiao
    Kwong, Sam
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1613 - 1627
  • [38] Shapley visual transformers for image-to-text generation
    Belhadi, Asma
    Djenouri, Youcef
    Belbachir, Ahmed Nabil
    Michalak, Tomasz
    Srivastava, Gautam
    APPLIED SOFT COMPUTING, 2024, 166
  • [39] Heterogeneous Knowledge Network for Visual Dialog
    Zhao, Lei
    Li, Junlin
    Gao, Lianli
    Rao, Yunbo
    Song, Jingkuan
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 861 - 871
  • [40] VISAtlas: An Image-Based Exploration and Query System for Large Visualization Collections via Neural Image Embedding
    Ye, Yilin
    Huang, Rong
    Zeng, Wei
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (07) : 3224 - 3240