Visual Relationship Embedding Network for Image Paragraph Generation

被引：14

作者：

Che, Wenbin ^{[1
,2
]}

Fan, Xiaopeng ^{[1
,2
]}

Xiong, Ruiqin ^{[3
]}

Zhao, Debin ^{[1
,2
]}

机构：

[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Intera, Dept Comp Sci & Technol, Harbin 150001, Peoples R China

[2] PengCheng Lab, Shenzhen 518055, Peoples R China

[3] Peking Univ, Inst Digital Media, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 09期

基金：

美国国家科学基金会;

关键词：

Visualization; Semantics; Task analysis; Proposals; Automobiles; Buildings; Gallium nitride; Paragraph generation; image caption; region localization; attention network; visual relationship; GAN; LSTM; LANGUAGE;

D O I：

10.1109/TMM.2019.2954750

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.

引用

页码：2307 / 2320

页数：14

共 50 条

[21] VD-SAN: Visual-Densely Semantic Attention Network for Image Caption Generation
He, Xinwei
Yang, Yang
Shi, Baoguang
Bai, Xiang
NEUROCOMPUTING, 2019, 328 : 48 - 55
[22] A Dual-Attention Learning Network With Word and Sentence Embedding for Medical Visual Question Answering
Huang, Xiaofei
Gong, Hongfang
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 832 - 845
[23] Chinese Image Caption Generation via Visual Attention and Topic Modeling
Liu, Maofu
Hu, Huijun
Li, Lingjun
Yu, Yan
Guan, Weili
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1247 - 1257
[24] Graph-Based Visual-Semantic Entanglement Network for Zero-Shot Image Recognition
Hu, Yang
Wen, Guihua
Chapman, Adriane
Yang, Pei
Luo, Mingnan
Xu, Yingxue
Dai, Dan
Hall, Wendy
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2473 - 2487
[25] A Multimodality Scene Graph Generation Approach for Robust Human-Robot Collaborative Assembly Visual Relationship Representation
Lv, Jianhao
Zhang, Rong
Li, Xinyu
Liu, Shimin
Liu, Tianyuan
Zhang, Qi
Bao, Jinsong
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (03) : 3242 - 3251
[26] Multi-Gate Attention Network for Image Captioning
Jiang, Weitao
Li, Xiying
Hu, Haifeng
Lu, Qiang
Liu, Bohong
IEEE ACCESS, 2021, 9 : 69700 - 69709
[27] Improving Visual Relationship Detection With Two-Stage Correlation Exploitation
Zhou, Hao
Zhang, Chongyang
Zhao, Muming
Luo, Yan
Hu, Chuanping
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2751 - 2763
[28] A Mutually Textual and Visual Refinement Network for Image-Text Matching
Pang, Shanmin
Zeng, Yueyang
Zhao, Jiawei
Xue, Jianru
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7555 - 7566
[29] Effective Multimodal Encoding for Image Paragraph Captioning
Nguyen, Thanh-Son
Fernando, Basura
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6381 - 6395
[30] Dual-Affinity Style Embedding Network for Semantic-Aligned Image Style Transfer
Ma, Zhuoqi
Lin, Tianwei
Li, Xin
Li, Fu
He, Dongliang
Ding, Errui
Wang, Nannan
Gao, Xinbo
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7404 - 7417

← 1 2 3 4 5 →