Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.
机构:
Beihang Univ, Beijing Key Lab Digital Media, Beijing 100191, Peoples R China
Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R ChinaBeihang Univ, Beijing Key Lab Digital Media, Beijing 100191, Peoples R China
Wang, Chendan
Chen, Bowen
论文数: 0引用数: 0
h-index: 0
机构:
Beihang Univ, Beijing Key Lab Digital Media, Beijing 100191, Peoples R China
Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R ChinaBeihang Univ, Beijing Key Lab Digital Media, Beijing 100191, Peoples R China
Chen, Bowen
Zou, Zhengxia
论文数: 0引用数: 0
h-index: 0
机构:
Beihang Univ, Sch Astronaut, Dept Guidance Nav & Control, Beijing 100191, Peoples R China
Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R ChinaBeihang Univ, Beijing Key Lab Digital Media, Beijing 100191, Peoples R China
Zou, Zhengxia
Shi, Zhenwei
论文数: 0引用数: 0
h-index: 0
机构:
Beihang Univ, Beijing Key Lab Digital Media, Beijing 100191, Peoples R China
Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R ChinaBeihang Univ, Beijing Key Lab Digital Media, Beijing 100191, Peoples R China
Shi, Zhenwei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING,
2023,
61
机构:
Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Shaanxi, Peoples R China
Xi An Jiao Tong Univ, Natl Engn Lab Big Data Analyt, Xian 710049, Shaanxi, Peoples R ChinaXi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Shaanxi, Peoples R China
Zeng, Hongwei
Zhi, Zhuo
论文数: 0引用数: 0
h-index: 0
机构:
Xi An Jiao Tong Univ, Sch Elect Engn, Xian 710049, Shaanxi, Peoples R ChinaXi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Shaanxi, Peoples R China
Zhi, Zhuo
Liu, Jun
论文数: 0引用数: 0
h-index: 0
机构:
Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Shaanxi, Peoples R China
Xi An Jiao Tong Univ, Natl Engn Lab Big Data Analyt, Xian 710049, Shaanxi, Peoples R ChinaXi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Shaanxi, Peoples R China
Liu, Jun
Wei, Bifan
论文数: 0引用数: 0
h-index: 0
机构:
Xi An Jiao Tong Univ, Natl Engn Lab Big Data Analyt, Xian 710049, Shaanxi, Peoples R China
Xi An Jiao Tong Univ, Sch Continuing Educ, Xian 710049, Shaanxi, Peoples R ChinaXi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Shaanxi, Peoples R China