Visual Relationship Embedding Network for Image Paragraph Generation

被引:14
|
作者
Che, Wenbin [1 ,2 ]
Fan, Xiaopeng [1 ,2 ]
Xiong, Ruiqin [3 ]
Zhao, Debin [1 ,2 ]
机构
[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Intera, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
[2] PengCheng Lab, Shenzhen 518055, Peoples R China
[3] Peking Univ, Inst Digital Media, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
基金
美国国家科学基金会;
关键词
Visualization; Semantics; Task analysis; Proposals; Automobiles; Buildings; Gallium nitride; Paragraph generation; image caption; region localization; attention network; visual relationship; GAN; LSTM; LANGUAGE;
D O I
10.1109/TMM.2019.2954750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.
引用
收藏
页码:2307 / 2320
页数:14
相关论文
共 50 条
  • [41] Realistic Image Generation from Text by Using BERT-Based Embedding
    Na, Sanghyuck
    Do, Mirae
    Yu, Kyeonah
    Kim, Juntae
    ELECTRONICS, 2022, 11 (05)
  • [42] EAES: Effective Augmented Embedding Spaces for Text-Based Image Captioning
    Khang Nguyen
    Bui, Doanh C.
    Truc Trinh
    Vo, Nguyen D.
    IEEE ACCESS, 2022, 10 : 32443 - 32452
  • [43] A Hierarchical Context Embedding Network for Object Detection in Remote Sensing Images
    Zhang, Ke
    Wu, Yulin
    Wang, Jingyu
    Wang, Qi
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [44] KE-RSIC: Remote Sensing Image Captioning Based on Knowledge Embedding
    Cheng, Kangda
    Cambria, Erik
    Liu, Jinlong
    Chen, Yushi
    Wu, Zhilu
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 4286 - 4304
  • [45] Remote Sensing Image Synthesis via Semantic Embedding Generative Adversarial Networks
    Wang, Chendan
    Chen, Bowen
    Zou, Zhengxia
    Shi, Zhenwei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [46] GANE: A Generative Adversarial Network Embedding
    Hong, Huiting
    Li, Xin
    Wang, Mingzhong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (07) : 2325 - 2335
  • [47] Semi-Heterogeneous Three-Way Joint Embedding Network for Sketch-Based Image Retrieval
    Lei, Jianjun
    Song, Yuxin
    Peng, Bo
    Ma, Zhanyu
    Shao, Ling
    Song, Yi-Zhe
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (09) : 3226 - 3237
  • [48] Context-Aware Visual Policy Network for Fine-Grained Image Captioning
    Zha, Zheng-Jun
    Liu, Daqing
    Zhang, Hanwang
    Zhang, Yongdong
    Wu, Feng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 710 - 722
  • [49] Improving paragraph-level question generation with extended answer network and uncertainty-aware beam search
    Zeng, Hongwei
    Zhi, Zhuo
    Liu, Jun
    Wei, Bifan
    INFORMATION SCIENCES, 2021, 571 : 50 - 64
  • [50] Hierarchical Deep Embedding for Aurora Image Retrieval
    Yang, Xi
    Gao, Xinbo
    Song, Bin
    Han, Bing
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (12) : 5773 - 5785