Style-Aware Contrastive Learning for Multi-Style Image Captioning

被引:0
|
作者
Zhou, Yucheng [1 ]
Long, Guodong [1 ]
机构
[1] Univ Technol Sydney, Australian AI Inst, Sch Comp Sci, FEIT, Sydney, NSW, Australia
来源
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.
引用
收藏
页码:2257 / 2267
页数:11
相关论文
共 50 条
  • [41] Unsupervised learning of style-aware facial animation from real acting performances
    Paier, Wolfgang
    Hilsmann, Anna
    Eisert, Peter
    GRAPHICAL MODELS, 2023, 129
  • [42] Contrastive Learning for Image Captioning
    Dai, Bo
    Lin, Dahua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [43] The Communication Value of Multi-style Subtitles
    Zeng, Guangyu
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON EDUCATION, SPORTS, ARTS AND MANAGEMENT ENGINEERING (ICESAME 2017), 2017, 123 : 685 - 690
  • [44] Multi-Style Generative Reading Comprehension
    Nishida, Kyosuke
    Saito, Itsumi
    Nishida, Kosuke
    Shinoda, Kazutoshi
    Otsuka, Atsushi
    Asano, Hisako
    Tomita, Junji
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2273 - 2284
  • [45] Interactive Artistic Multi-style Transfer
    Wang, Xiaohui
    Lyu, Yiran
    Huang, Junfeng
    Wang, Ziying
    Qin, Jingyan
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01)
  • [46] Interactive Artistic Multi-style Transfer
    Xiaohui Wang
    Yiran Lyu
    Junfeng Huang
    Ziying Wang
    Jingyan Qin
    International Journal of Computational Intelligence Systems, 14
  • [47] Classifier-guided multi-style tile image generation method
    Lu, Jianfeng
    Shi, Mengtao
    Song, Chuhua
    Zhao, Weihao
    Xi, Lifeng
    Emam, Mahmoud
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (01)
  • [48] Multi-speaker Multi-style Speech Synthesis with Timbre and Style Disentanglement
    Song, Wei
    Yue, Yanghao
    Zhang, Ya-jie
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 132 - 140
  • [49] Discriminative Style Learning for Cross-Domain Image Captioning
    Yuan, Jin
    Zhu, Shuai
    Huang, Shuyin
    Zhang, Hanwang
    Xiao, Yaoqiang
    Li, Zhiyong
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1723 - 1736
  • [50] Multi-Style Unsupervised Image Synthesis Using Generative Adversarial Nets
    Lv, Guoyun
    Israr, Syed Muhammad
    Qi, Shengyong
    IEEE ACCESS, 2021, 9 : 86025 - 86036