Style-Aware Contrastive Learning for Multi-Style Image Captioning

被引:0
|
作者
Zhou, Yucheng [1 ]
Long, Guodong [1 ]
机构
[1] Univ Technol Sydney, Australian AI Inst, Sch Comp Sci, FEIT, Sydney, NSW, Australia
来源
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.
引用
收藏
页码:2257 / 2267
页数:11
相关论文
共 50 条
  • [1] Style-aware two-stage learning framework for video captioning
    Ma, Yunchuan
    Zhu, Zheng
    Qi, Yuankai
    Beheshti, Amin
    Li, Ying
    Qing, Laiyun
    Li, Guorong
    KNOWLEDGE-BASED SYSTEMS, 2024, 301
  • [2] Cross-domain multi-style merge for image captioning
    Duan, Yiqun
    Wang, Zhen
    Li, Yi
    Wang, Jingya
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 228
  • [3] MSCap: Multi-Style Image Captioning with Unpaired Stylized Text
    Guo, Longteng
    Liu, Jing
    Yao, Peng
    Li, Jiangwei
    Lu, Hanqing
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4199 - 4208
  • [4] Multi-Model Style-Aware Diffusion Learning for Semantic Image Synthesis
    Niu, Yunfang
    Wu, Lingxiang
    Zhang, Yufeng
    Zhu, Yousong
    Zhu, Guibo
    Wang, Jinqiao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (11)
  • [5] A Style-aware Discriminator for Controllable Image Translation
    Kim, Kunhee
    Park, Sanghun
    Jeon, Eunyeong
    Kim, Taehun
    Kim, Daijin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18218 - 18227
  • [6] Style-aware and multi-scale attention for face image completion
    Liu H.
    Li S.
    Zhu X.
    Sun H.
    Zhang J.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2022, 54 (05): : 49 - 56
  • [7] Parallel Style-Aware Image Cloning for Artworks
    Zhao, Yandan
    Jin, Xiaogang
    Xu, Yingqing
    Zhao, Hanli
    Ai, Meng
    Zhou, Kun
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2015, 21 (02) : 229 - 240
  • [8] A style-aware network based on multi-task learning for multi-domain image normalization
    Zhao, Jing
    He, Yong-jun
    Shi, Zheng
    Qin, Jian
    Xie, Yi-ning
    VISUAL COMPUTER, 2025, 41 (01): : 773 - 783
  • [9] StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement
    Song, Yuda
    Qian, Hui
    Du, Xin
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 4106 - 4115
  • [10] Room Style Estimation for Style-Aware Recommendation
    Ataer-Cansizoglu, Esra
    Liu, Hantian
    Weiss, Tomer
    Mitra, Archi
    Dholakia, Dhaval
    Choi, Jae-Woo
    Wayfair, Dan Wulin
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY (AIVR), 2019, : 267 - 270