Stylized image captioning (SIC) aims to generate captions with target style for images. The biggest challenge is that the collection and annotation of stylized data are pretty difficult and time-consuming. Most existing methods learn massive factual captions or additional stylized bookcorpus independently to assist in generating stylized caption, which ignore core relationships between existing image-fact-style trident data. In this paper, we propose a novel image-fact-style trident semantic framework TridentCap for stylized image captioning, which includes an image-fact semantic fusion encoder (SFE) and a trident stylization decoder (TSD). Unlike existing methods, we directly mine the core relationship in image-fact-style trident data and use factual semantic and image to build cross-modal semantic feature space, achieving the coherence between image and text. Specifically, SFE aims to learn the image-related prior language knowledge information from factual text and leverage fine-grained region-level semantic correlations of image and factual text to achieve cross-modal semantic information alignment and integration. TSD is designed to decouple the dual-source fused semantic feature based on the target style to achieve stylized caption generation. In addition, we design a pseudo labels filter (PLF) to obtain and expand massive image-fact-style trident data by building pseudo stylized annotations for all image-fact data in traditional caption datasets, which can further strengthen stylized caption learning. It is a generic algorithm to solve the problem of insufficient data and can be used into any existing stylized caption models. We conduct extensive experiments on SentiCap and FlickrStyle datasets, which achieve consistently improvement on almost all metrics. Our code will be released at: https://github.com/WangLanxiao/TridentCap_Code.
机构:
Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Guangdong, Peoples R ChinaTsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Guangdong, Peoples R China
Zhang, Miao
Yin, Jun
论文数: 0引用数: 0
h-index: 0
机构:
Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Guangdong, Peoples R ChinaTsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Guangdong, Peoples R China
Yin, Jun
Zeng, Pengyu
论文数: 0引用数: 0
h-index: 0
机构:
Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Guangdong, Peoples R ChinaTsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Guangdong, Peoples R China
Zeng, Pengyu
Shen, Yiqing
论文数: 0引用数: 0
h-index: 0
机构:
Johns Hopkins Univ, 3400 N Charles St, Baltimore, MD 21218 USATsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Guangdong, Peoples R China
Shen, Yiqing
Lu, Shuai
论文数: 0引用数: 0
h-index: 0
机构:
Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Guangdong, Peoples R ChinaTsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Guangdong, Peoples R China
Lu, Shuai
Wang, Xueqian
论文数: 0引用数: 0
h-index: 0
机构:
Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Guangdong, Peoples R ChinaTsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Guangdong, Peoples R China
机构:
China Univ Geosci, Sch Land Sci & Technol, Beijing 100083, Peoples R ChinaChina Univ Geosci, Sch Land Sci & Technol, Beijing 100083, Peoples R China
Chen, Dong
Wang, Yuebin
论文数: 0引用数: 0
h-index: 0
机构:
China Univ Geosci, Sch Land Sci & Technol, Beijing 100083, Peoples R ChinaChina Univ Geosci, Sch Land Sci & Technol, Beijing 100083, Peoples R China
Wang, Yuebin
Zhang, Liqiang
论文数: 0引用数: 0
h-index: 0
机构:
Beijing Normal Univ, Beijing Key Lab Environm Remote Sensing & Digital, Fac Geog Sci, Beijing 100875, Peoples R ChinaChina Univ Geosci, Sch Land Sci & Technol, Beijing 100083, Peoples R China
Zhang, Liqiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING,
2024,
62