Style-Aware Contrastive Learning for Multi-Style Image Captioning

被引：0

作者：

Zhou, Yucheng ^{[1
]}

Long, Guodong ^{[1
]}

机构：

[1] Univ Technol Sydney, Australian AI Inst, Sch Comp Sci, FEIT, Sydney, NSW, Australia

来源：

17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.

引用

页码：2257 / 2267

页数：11

共 50 条

[31] Style-aware Augmented Virtuality Embeddings (SAVE)
Hoster, Johannes
Ritter, Dennis
Hildebrand, Kristian
2023 IEEE CONFERENCE VIRTUAL REALITY AND 3D USER INTERFACES, VR, 2023, : 163 - 172
[32] A Style-Aware Content Loss for Real-Time HD Style Transfer
Sanakoyeu, Artsiom
Kotovenko, Dmytro
Lang, Sabine
Ommer, Bjoern
COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 715 - 731
[33] Multi-style image transfer system using conditional cycleGAN
Tu, Ching-Ting
Lin, Hwei Jen
Tsia, Yihjia
IMAGING SCIENCE JOURNAL, 2021, 69 (1-4): : 1 - 14
[34] PROTOTYPE-TO-STYLE: Dialogue Generation With Style-Aware Editing on Retrieval Memory
Su, Yixuan
Wang, Yan
Cai, Deng
Baker, Simon
Korhonen, Anna
Collier, Nigel
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2152 - 2161
[35] Free-Lunch for Cross-Domain Few-Shot Learning: Style-Aware Episodic Training with Robust Contrastive Learning
Zhang, Ji
Song, Jingkuan
Gao, Lianli
Shen, Hengtao
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2586 - 2594
[36] Topic and Style-aware Transformer for Multimodal Emotion Recognition
Qiu, Shuwen
Sekhar, Nitesh
Singhal, Prateek
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 2074 - 2082
[37] Pseudo-Supervised Learning for Semantic Multi-Style Transfer
Kim, Saehun
Do, Jeonghyeok
Kim, Munchurl
IEEE ACCESS, 2021, 9 (09): : 7930 - 7942
[38] Multi-Style Migration QR Code
You, Fucheng
Lai, Shuren
Gong, Hechen
Zhao, Yangze
3RD ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI2018), 2018, 1069
[39] Fast Video Multi-Style Transfer
Gao, Wei
Lie, Yijun
Yin, Yihang
Yang, Ming-Hsuan
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 3211 - 3219
[40] UNSUPERVISED LEARNING FOR MULTI-STYLE SPEECH SYNTHESIS WITH LIMITED DATA
Liang, Shuang
Miao, Chenfeng
Chen, Minchuan
Ma, Jun
Wang, Shaojun
Xiao, Jing
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6583 - 6587

← 1 2 3 4 5 →