Style-Aware Contrastive Learning for Multi-Style Image Captioning

被引：0

作者：

Zhou, Yucheng ^{[1
]}

Long, Guodong ^{[1
]}

机构：

[1] Univ Technol Sydney, Australian AI Inst, Sch Comp Sci, FEIT, Sydney, NSW, Australia

来源：

17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.

引用

页码：2257 / 2267

页数：11

共 50 条

[1] Style-aware two-stage learning framework for video captioning
Ma, Yunchuan
Zhu, Zheng
Qi, Yuankai
Beheshti, Amin
Li, Ying
Qing, Laiyun
Li, Guorong
KNOWLEDGE-BASED SYSTEMS, 2024, 301
[2] Cross-domain multi-style merge for image captioning
Duan, Yiqun
Wang, Zhen
Li, Yi
Wang, Jingya
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 228
[3] MSCap: Multi-Style Image Captioning with Unpaired Stylized Text
Guo, Longteng
Liu, Jing
Yao, Peng
Li, Jiangwei
Lu, Hanqing
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4199 - 4208
[4] Multi-Model Style-Aware Diffusion Learning for Semantic Image Synthesis
Niu, Yunfang
Wu, Lingxiang
Zhang, Yufeng
Zhu, Yousong
Zhu, Guibo
Wang, Jinqiao
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (11)
[5] A Style-aware Discriminator for Controllable Image Translation
Kim, Kunhee
Park, Sanghun
Jeon, Eunyeong
Kim, Taehun
Kim, Daijin
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18218 - 18227
[6] Style-aware and multi-scale attention for face image completion
Liu H.
Li S.
Zhu X.
Sun H.
Zhang J.
Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2022, 54 (05): : 49 - 56
[7] Parallel Style-Aware Image Cloning for Artworks
Zhao, Yandan
Jin, Xiaogang
Xu, Yingqing
Zhao, Hanli
Ai, Meng
Zhou, Kun
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2015, 21 (02) : 229 - 240
[8] A style-aware network based on multi-task learning for multi-domain image normalization
Zhao, Jing
He, Yong-jun
Shi, Zheng
Qin, Jian
Xie, Yi-ning
VISUAL COMPUTER, 2025, 41 (01): : 773 - 783
[9] StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement
Song, Yuda
Qian, Hui
Du, Xin
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 4106 - 4115
[10] Room Style Estimation for Style-Aware Recommendation
Ataer-Cansizoglu, Esra
Liu, Hantian
Weiss, Tomer
Mitra, Archi
Dholakia, Dhaval
Choi, Jae-Woo
Wayfair, Dan Wulin
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY (AIVR), 2019, : 267 - 270

← 1 2 3 4 5 →