Style-Aware Contrastive Learning for Multi-Style Image Captioning

被引：0

作者：

Zhou, Yucheng ^{[1
]}

Long, Guodong ^{[1
]}

机构：

[1] Univ Technol Sydney, Australian AI Inst, Sch Comp Sci, FEIT, Sydney, NSW, Australia

来源：

17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.

引用

页码：2257 / 2267

页数：11

共 50 条

[41] Unsupervised learning of style-aware facial animation from real acting performances
Paier, Wolfgang
Hilsmann, Anna
Eisert, Peter
GRAPHICAL MODELS, 2023, 129
[42] Contrastive Learning for Image Captioning
Dai, Bo
Lin, Dahua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[43] The Communication Value of Multi-style Subtitles
Zeng, Guangyu
PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON EDUCATION, SPORTS, ARTS AND MANAGEMENT ENGINEERING (ICESAME 2017), 2017, 123 : 685 - 690
[44] Multi-Style Generative Reading Comprehension
Nishida, Kyosuke
Saito, Itsumi
Nishida, Kosuke
Shinoda, Kazutoshi
Otsuka, Atsushi
Asano, Hisako
Tomita, Junji
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2273 - 2284
[45] Interactive Artistic Multi-style Transfer
Wang, Xiaohui
Lyu, Yiran
Huang, Junfeng
Wang, Ziying
Qin, Jingyan
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01)
[46] Interactive Artistic Multi-style Transfer
Xiaohui Wang
Yiran Lyu
Junfeng Huang
Ziying Wang
Jingyan Qin
International Journal of Computational Intelligence Systems, 14
[47] Classifier-guided multi-style tile image generation method
Lu, Jianfeng
Shi, Mengtao
Song, Chuhua
Zhao, Weihao
Xi, Lifeng
Emam, Mahmoud
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (01)
[48] Multi-speaker Multi-style Speech Synthesis with Timbre and Style Disentanglement
Song, Wei
Yue, Yanghao
Zhang, Ya-jie
Zhang, Zhengchen
Wu, Youzheng
He, Xiaodong
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 132 - 140
[49] Discriminative Style Learning for Cross-Domain Image Captioning
Yuan, Jin
Zhu, Shuai
Huang, Shuyin
Zhang, Hanwang
Xiao, Yaoqiang
Li, Zhiyong
Wang, Meng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1723 - 1736
[50] Multi-Style Unsupervised Image Synthesis Using Generative Adversarial Nets
Lv, Guoyun
Israr, Syed Muhammad
Qi, Shengyong
IEEE ACCESS, 2021, 9 : 86025 - 86036

← 1 2 3 4 5 →