TridentCap: Image-Fact-Style Trident Semantic Framework for Stylized Image Captioning

被引:1
|
作者
Wang, Lanxiao [1 ]
Qiu, Heqian [1 ]
Qiu, Benliu [1 ]
Meng, Fanman [1 ]
Wu, Qingbo [1 ]
Li, Hongliang [1 ]
机构
[1] Univ Elect Sci & Technol China UESTC, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Decoding; Dogs; Task analysis; Feature extraction; Annotations; Visualization; Stylized image captioning; trident data; image-fact-style; multi-style captioning; pseudo labels filter; NETWORK;
D O I
10.1109/TCSVT.2023.3315133
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Stylized image captioning (SIC) aims to generate captions with target style for images. The biggest challenge is that the collection and annotation of stylized data are pretty difficult and time-consuming. Most existing methods learn massive factual captions or additional stylized bookcorpus independently to assist in generating stylized caption, which ignore core relationships between existing image-fact-style trident data. In this paper, we propose a novel image-fact-style trident semantic framework TridentCap for stylized image captioning, which includes an image-fact semantic fusion encoder (SFE) and a trident stylization decoder (TSD). Unlike existing methods, we directly mine the core relationship in image-fact-style trident data and use factual semantic and image to build cross-modal semantic feature space, achieving the coherence between image and text. Specifically, SFE aims to learn the image-related prior language knowledge information from factual text and leverage fine-grained region-level semantic correlations of image and factual text to achieve cross-modal semantic information alignment and integration. TSD is designed to decouple the dual-source fused semantic feature based on the target style to achieve stylized caption generation. In addition, we design a pseudo labels filter (PLF) to obtain and expand massive image-fact-style trident data by building pseudo stylized annotations for all image-fact data in traditional caption datasets, which can further strengthen stylized caption learning. It is a generic algorithm to solve the problem of insufficient data and can be used into any existing stylized caption models. We conduct extensive experiments on SentiCap and FlickrStyle datasets, which achieve consistently improvement on almost all metrics. Our code will be released at: https://github.com/WangLanxiao/TridentCap_Code.
引用
收藏
页码:3563 / 3575
页数:13
相关论文
共 44 条
  • [11] Improving Stylized Image Captioning with Better Use of Transformer
    Tan, Yutong
    Lin, Zheng
    Liu, Huan
    Zuo, Fan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 347 - 358
  • [12] Learning Cooperative Neural Modules for Stylized Image Captioning
    Wu, Xinxiao
    Zhao, Wentian
    Luo, Jiebo
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2305 - 2320
  • [13] Discriminative Style Learning for Cross-Domain Image Captioning
    Yuan, Jin
    Zhu, Shuai
    Huang, Shuyin
    Zhang, Hanwang
    Xiao, Yaoqiang
    Li, Zhiyong
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1723 - 1736
  • [14] Memorial GAN With Joint Semantic Optimization for Unpaired Image Captioning
    Song, Peipei
    Guo, Dan
    Zhou, Jinxing
    Xu, Mingliang
    Wang, Meng
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (07) : 4388 - 4399
  • [15] "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention
    Chen, Tianlang
    Zhang, Zhongping
    You, Quanzeng
    Fang, Chen
    Wang, Zhaowen
    Jin, Hailin
    Luo, Jiebo
    COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 527 - 543
  • [16] Semantic Context-Aware Image Style Transfer
    Liao, Yi-Sheng
    Huang, Chun-Rong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1911 - 1923
  • [17] Enhanced CLIP-GPT Framework for Cross-Lingual Remote Sensing Image Captioning
    Song, Rui
    Zhao, Beigeng
    Yu, Lizhi
    IEEE ACCESS, 2025, 13 : 904 - 915
  • [18] Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance
    Zhu, Yongshuo
    Li, Lu
    Chen, Keyan
    Liu, Chenyang
    Zhou, Fugen
    Shi, Zhenwei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [19] Dual-Affinity Style Embedding Network for Semantic-Aligned Image Style Transfer
    Ma, Zhuoqi
    Lin, Tianwei
    Li, Xin
    Li, Fu
    He, Dongliang
    Ding, Errui
    Wang, Nannan
    Gao, Xinbo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7404 - 7417
  • [20] Image Style Transfer Algorithm Based on Semantic Segmentation
    Lin, Zhijie
    Wang, Zhizhong
    Chen, Haibo
    Ma, Xiaolong
    Xie, Chuan
    Xing, Wei
    Zhao, Lei
    Song, Wei
    IEEE ACCESS, 2021, 9 : 54518 - 54529