Visual Relationship Embedding Network for Image Paragraph Generation

被引:14
|
作者
Che, Wenbin [1 ,2 ]
Fan, Xiaopeng [1 ,2 ]
Xiong, Ruiqin [3 ]
Zhao, Debin [1 ,2 ]
机构
[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Intera, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
[2] PengCheng Lab, Shenzhen 518055, Peoples R China
[3] Peking Univ, Inst Digital Media, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
基金
美国国家科学基金会;
关键词
Visualization; Semantics; Task analysis; Proposals; Automobiles; Buildings; Gallium nitride; Paragraph generation; image caption; region localization; attention network; visual relationship; GAN; LSTM; LANGUAGE;
D O I
10.1109/TMM.2019.2954750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.
引用
收藏
页码:2307 / 2320
页数:14
相关论文
共 50 条
  • [21] VD-SAN: Visual-Densely Semantic Attention Network for Image Caption Generation
    He, Xinwei
    Yang, Yang
    Shi, Baoguang
    Bai, Xiang
    NEUROCOMPUTING, 2019, 328 : 48 - 55
  • [22] A Dual-Attention Learning Network With Word and Sentence Embedding for Medical Visual Question Answering
    Huang, Xiaofei
    Gong, Hongfang
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 832 - 845
  • [23] Chinese Image Caption Generation via Visual Attention and Topic Modeling
    Liu, Maofu
    Hu, Huijun
    Li, Lingjun
    Yu, Yan
    Guan, Weili
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1247 - 1257
  • [24] Graph-Based Visual-Semantic Entanglement Network for Zero-Shot Image Recognition
    Hu, Yang
    Wen, Guihua
    Chapman, Adriane
    Yang, Pei
    Luo, Mingnan
    Xu, Yingxue
    Dai, Dan
    Hall, Wendy
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2473 - 2487
  • [25] A Multimodality Scene Graph Generation Approach for Robust Human-Robot Collaborative Assembly Visual Relationship Representation
    Lv, Jianhao
    Zhang, Rong
    Li, Xinyu
    Liu, Shimin
    Liu, Tianyuan
    Zhang, Qi
    Bao, Jinsong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (03) : 3242 - 3251
  • [26] Multi-Gate Attention Network for Image Captioning
    Jiang, Weitao
    Li, Xiying
    Hu, Haifeng
    Lu, Qiang
    Liu, Bohong
    IEEE ACCESS, 2021, 9 : 69700 - 69709
  • [27] Improving Visual Relationship Detection With Two-Stage Correlation Exploitation
    Zhou, Hao
    Zhang, Chongyang
    Zhao, Muming
    Luo, Yan
    Hu, Chuanping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2751 - 2763
  • [28] A Mutually Textual and Visual Refinement Network for Image-Text Matching
    Pang, Shanmin
    Zeng, Yueyang
    Zhao, Jiawei
    Xue, Jianru
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7555 - 7566
  • [29] Effective Multimodal Encoding for Image Paragraph Captioning
    Nguyen, Thanh-Son
    Fernando, Basura
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6381 - 6395
  • [30] Dual-Affinity Style Embedding Network for Semantic-Aligned Image Style Transfer
    Ma, Zhuoqi
    Lin, Tianwei
    Li, Xin
    Li, Fu
    He, Dongliang
    Ding, Errui
    Wang, Nannan
    Gao, Xinbo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7404 - 7417