StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models

被引：27

作者：

Wang, Zhizhong ^{[1
]}

Zhao, Lei ^{[1
]}

Xing, Wei ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年

关键词：

D O I：

10.1109/ICCV51070.2023.00706

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Content and style (C-S) disentanglement is a fundamental problem and critical challenge of style transfer. Existing approaches based on explicit definitions (e.g., Gram matrix) or implicit learning (e.g., GANs) are neither interpretable nor easy to control, resulting in entangled representations and less satisfying results. In this paper, we propose a new C-S disentangled framework for style transfer without using previous assumptions. The key insight is to explicitly extract the content information and implicitly learn the complementary style information, yielding interpretable and controllable C-S disentanglement and style transfer. A simple yet effective CLIP-based style disentanglement loss coordinated with a style reconstruction prior is introduced to disentangle C-S in the CLIP image space. By further leveraging the powerful style removal and generative ability of diffusion models, our framework achieves superior results than state of the art and flexible C-S disentanglement and trade-off control. Our work provides new insights into the C-S disentanglement in style transfer and demonstrates the potential of diffusion models for learning well-disentangled C-S characteristics.

引用

页码：7643 / 7655

页数：13

共 50 条

[1] Multi-Source Training-Free Controllable Style Transfer via Diffusion Models
Yu, Cuihong
Han, Cheng
Zhang, Chao
SYMMETRY-BASEL, 2025, 17 (02):
[2] Musical Composition Style Transfer via Disentangled Timbre Representations
Hung, Yun-Ning
Chiang, I-Tung
Chen, Yi-An
Yang, Yi-Hsuan
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4697 - 4703
[3] Controllable Cardiac Synthesis via Disentangled Anatomy Arithmetic
Thermos, Spyridon
Liu, Xiao
O'Neil, Alison
Tsaftaris, Sotirios A.
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT III, 2021, 12903 : 160 - 170
[4] Inversion-based Style Transfer with Diffusion Models
Zhang, Yuxin
Huang, Nisha
Tang, Fan
Huang, Haibin
Ma, Chongyang
Dong, Weiming
Xu, Changsheng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10146 - 10156
[5] User-Controllable Arbitrary Style Transfer via Entropy Regularization
Cheng, Jiaxin
Wu, Yue
Jaiswal, Ayush
Zhang, Xu
Natarajan, Pradeep
Natarajan, Prem
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 433 - 441
[6] Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation
Dai, Ning
Liang, Jianze
Qiu, Xipeng
Huang, Xuanjing
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5997 - 6007
[7] Controllable Conversation Generation with Conversation Structures via Diffusion Models
Chen, Jiaao
Yang, Diyi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7238 - 7251
[8] Computational Decomposition of Style for Controllable and Enhanced Style Transfer
Li, Minchao
Tu, Shikui
Xu, Lei
INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING, PT II, 2019, 11936 : 15 - 39
[9] Diffusion-Enhanced PatchMatch: A Framework for Arbitrary Style Transfer with Diffusion Models
Hamazaspyan, Mark
Navasardyan, Shant
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 797 - 805
[10] Controllable Artistic Text Style Transfer via Shape-Matching GAN
Yang, Shuai
Wang, Zhangyang
Wang, Zhaowen
Xu, Ning
Liu, Jiaying
Guo, Zongming
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4441 - 4450

← 1 2 3 4 5 →