TridentCap: Image-Fact-Style Trident Semantic Framework for Stylized Image Captioning

被引:1
|
作者
Wang, Lanxiao [1 ]
Qiu, Heqian [1 ]
Qiu, Benliu [1 ]
Meng, Fanman [1 ]
Wu, Qingbo [1 ]
Li, Hongliang [1 ]
机构
[1] Univ Elect Sci & Technol China UESTC, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Decoding; Dogs; Task analysis; Feature extraction; Annotations; Visualization; Stylized image captioning; trident data; image-fact-style; multi-style captioning; pseudo labels filter; NETWORK;
D O I
10.1109/TCSVT.2023.3315133
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Stylized image captioning (SIC) aims to generate captions with target style for images. The biggest challenge is that the collection and annotation of stylized data are pretty difficult and time-consuming. Most existing methods learn massive factual captions or additional stylized bookcorpus independently to assist in generating stylized caption, which ignore core relationships between existing image-fact-style trident data. In this paper, we propose a novel image-fact-style trident semantic framework TridentCap for stylized image captioning, which includes an image-fact semantic fusion encoder (SFE) and a trident stylization decoder (TSD). Unlike existing methods, we directly mine the core relationship in image-fact-style trident data and use factual semantic and image to build cross-modal semantic feature space, achieving the coherence between image and text. Specifically, SFE aims to learn the image-related prior language knowledge information from factual text and leverage fine-grained region-level semantic correlations of image and factual text to achieve cross-modal semantic information alignment and integration. TSD is designed to decouple the dual-source fused semantic feature based on the target style to achieve stylized caption generation. In addition, we design a pseudo labels filter (PLF) to obtain and expand massive image-fact-style trident data by building pseudo stylized annotations for all image-fact data in traditional caption datasets, which can further strengthen stylized caption learning. It is a generic algorithm to solve the problem of insufficient data and can be used into any existing stylized caption models. We conduct extensive experiments on SentiCap and FlickrStyle datasets, which achieve consistently improvement on almost all metrics. Our code will be released at: https://github.com/WangLanxiao/TridentCap_Code.
引用
收藏
页码:3563 / 3575
页数:13
相关论文
共 44 条
  • [31] SBSS: Stacking-Based Semantic Segmentation Framework for Very High-Resolution Remote Sensing Image
    Cai, Yuanzhi
    Fan, Lei
    Fang, Yuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [32] Image Colorization Using the Global Scene-Context Style and Pixel-Wise Semantic Segmentation
    Tram-Tran Nguyen-Quynh
    Kim, Soo-Hyung
    Nhu-Tai Do
    IEEE ACCESS, 2020, 8 : 214098 - 214114
  • [33] SpineParseNet: Spine Parsing for Volumetric MR Image by a Two-Stage Segmentation Framework With Semantic Image Representation
    Pang, Shumao
    Pang, Chunlan
    Zhao, Lei
    Chen, Yangfan
    Su, Zhihai
    Zhou, Yujia
    Huang, Meiyan
    Yang, Wei
    Lu, Hai
    Feng, Qianjin
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (01) : 262 - 273
  • [34] SEM-CS: SEMANTIC CLIPSTYLER FOR TEXT-BASED IMAGE STYLE TRANSFER
    Kamra, Chanda Grover
    Mastan, Indra Deep
    Gupta, Debayan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 395 - 399
  • [35] A Multi-Level Convolution Pyramid Semantic Fusion Framework for High-Resolution Remote Sensing Image Scene Classification and Annotation
    Sun, Xiongli
    Zhu, Qiqi
    Qin, Qianqing
    IEEE ACCESS, 2021, 9 (09): : 18195 - 18208
  • [36] A deep learning semantic template matching framework for remote sensing image registration
    Li, Liangzhi
    Han, Ling
    Ding, Mingtao
    Cao, Hongye
    Hu, Huijuan
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 181 : 205 - 217
  • [37] Simple and Efficient: A Semisupervised Learning Framework for Remote Sensing Image Semantic Segmentation
    Lu, Xiaoqiang
    Jiao, Licheng
    Liu, Fang
    Yang, Shuyuan
    Liu, Xu
    Feng, Zhixi
    Li, Lingling
    Chen, Puhua
    IEEE Transactions on Geoscience and Remote Sensing, 2022, 60
  • [38] SCDFuse: A semantic complementary distillation framework for joint infrared and visible image fusion and denoising
    Xie, Shidong
    Li, Haiyan
    Zang, Yongsheng
    Cao, Jinde
    Zhou, Dongming
    Tan, Mingchuan
    Ding, Zhaisheng
    Wang, Guanbo
    KNOWLEDGE-BASED SYSTEMS, 2025, 315
  • [39] Tuple Perturbation-Based Contrastive Learning Framework for Multimodal Remote Sensing Image Semantic Segmentation
    Ye, Yuanxin
    Dai, Jinkun
    Zhou, Liang
    Duan, Keyi
    Tao, Ran
    Li, Wei
    Hong, Danfeng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [40] Framework for Automatic Semantic Annotation of Images Based on Image's Low-Level Features and Surrounding Text
    Helmy, Tarek
    Djatmiko, Fahim
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (02) : 1991 - 2007