Visual Relationship Embedding Network for Image Paragraph Generation

被引：14

作者：

Che, Wenbin ^{[1
,2
]}

Fan, Xiaopeng ^{[1
,2
]}

Xiong, Ruiqin ^{[3
]}

Zhao, Debin ^{[1
,2
]}

机构：

[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Intera, Dept Comp Sci & Technol, Harbin 150001, Peoples R China

[2] PengCheng Lab, Shenzhen 518055, Peoples R China

[3] Peking Univ, Inst Digital Media, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 09期

基金：

美国国家科学基金会;

关键词：

Visualization; Semantics; Task analysis; Proposals; Automobiles; Buildings; Gallium nitride; Paragraph generation; image caption; region localization; attention network; visual relationship; GAN; LSTM; LANGUAGE;

D O I：

10.1109/TMM.2019.2954750

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.

引用

页码：2307 / 2320

页数：14

共 50 条

[31] Visual Cluster Grounding for Image Captioning
Jiang, Wenhui
Zhu, Minwei
Fang, Yuming
Shi, Guangming
Zhao, Xiaowei
Liu, Yang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3920 - 3934
[32] Label Generation System Based on Generative Adversarial Network for Medical Image
Li, Jiyun
Hong, Yongliang
2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND PATTERN RECOGNITION (AIPR 2019), 2019, : 78 - 82
[33] Impact of Heterogeneity on Network Embedding
Liang, Bo
Wang, Xiaofan
Wang, Lin
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (03): : 1296 - 1307
[34] Sampling Propagation Attention With Trimap Generation Network for Natural Image Matting
Zhou, Yuhongze
Zhou, Liguang
Lam, Tin Lun
Xu, Yangsheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5828 - 5843
[35] Embedding Attention and Residual Network for Accurate Salient Object Detection
Chen, Shuhan
Wang, Ben
Tan, Xiuli
Hu, Xuelong
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (05) : 2050 - 2062
[36] SkeletonNet: A Hybrid Network With a Skeleton-Embedding Process for Multi-View Image Representation Learning
Yang, Shijie
Li, Liang
Wang, Shuhui
Zhang, Weigang
Huang, Qingming
Tian, Qi
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2916 - 2929
[37] VCRNet: Visual Compensation Restoration Network for No-Reference Image Quality Assessment
Pan, Zhaoqing
Yuan, Feng
Lei, Jianjun
Fang, Yuming
Shao, Xiao
Kwong, Sam
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1613 - 1627
[38] Shapley visual transformers for image-to-text generation
Belhadi, Asma
Djenouri, Youcef
Belbachir, Ahmed Nabil
Michalak, Tomasz
Srivastava, Gautam
APPLIED SOFT COMPUTING, 2024, 166
[39] Heterogeneous Knowledge Network for Visual Dialog
Zhao, Lei
Li, Junlin
Gao, Lianli
Rao, Yunbo
Song, Jingkuan
Shen, Heng Tao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 861 - 871
[40] VISAtlas: An Image-Based Exploration and Query System for Large Visualization Collections via Neural Image Embedding
Ye, Yilin
Huang, Rong
Zeng, Wei
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (07) : 3224 - 3240

← 1 2 3 4 5 →