StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models

被引:27
|
作者
Wang, Zhizhong [1 ]
Zhao, Lei [1 ]
Xing, Wei [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年
关键词
D O I
10.1109/ICCV51070.2023.00706
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Content and style (C-S) disentanglement is a fundamental problem and critical challenge of style transfer. Existing approaches based on explicit definitions (e.g., Gram matrix) or implicit learning (e.g., GANs) are neither interpretable nor easy to control, resulting in entangled representations and less satisfying results. In this paper, we propose a new C-S disentangled framework for style transfer without using previous assumptions. The key insight is to explicitly extract the content information and implicitly learn the complementary style information, yielding interpretable and controllable C-S disentanglement and style transfer. A simple yet effective CLIP-based style disentanglement loss coordinated with a style reconstruction prior is introduced to disentangle C-S in the CLIP image space. By further leveraging the powerful style removal and generative ability of diffusion models, our framework achieves superior results than state of the art and flexible C-S disentanglement and trade-off control. Our work provides new insights into the C-S disentanglement in style transfer and demonstrates the potential of diffusion models for learning well-disentangled C-S characteristics.
引用
收藏
页码:7643 / 7655
页数:13
相关论文
共 50 条
  • [1] Multi-Source Training-Free Controllable Style Transfer via Diffusion Models
    Yu, Cuihong
    Han, Cheng
    Zhang, Chao
    SYMMETRY-BASEL, 2025, 17 (02):
  • [2] Musical Composition Style Transfer via Disentangled Timbre Representations
    Hung, Yun-Ning
    Chiang, I-Tung
    Chen, Yi-An
    Yang, Yi-Hsuan
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4697 - 4703
  • [3] Controllable Cardiac Synthesis via Disentangled Anatomy Arithmetic
    Thermos, Spyridon
    Liu, Xiao
    O'Neil, Alison
    Tsaftaris, Sotirios A.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT III, 2021, 12903 : 160 - 170
  • [4] Inversion-based Style Transfer with Diffusion Models
    Zhang, Yuxin
    Huang, Nisha
    Tang, Fan
    Huang, Haibin
    Ma, Chongyang
    Dong, Weiming
    Xu, Changsheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10146 - 10156
  • [5] User-Controllable Arbitrary Style Transfer via Entropy Regularization
    Cheng, Jiaxin
    Wu, Yue
    Jaiswal, Ayush
    Zhang, Xu
    Natarajan, Pradeep
    Natarajan, Prem
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 433 - 441
  • [6] Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation
    Dai, Ning
    Liang, Jianze
    Qiu, Xipeng
    Huang, Xuanjing
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5997 - 6007
  • [7] Controllable Conversation Generation with Conversation Structures via Diffusion Models
    Chen, Jiaao
    Yang, Diyi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7238 - 7251
  • [8] Computational Decomposition of Style for Controllable and Enhanced Style Transfer
    Li, Minchao
    Tu, Shikui
    Xu, Lei
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING, PT II, 2019, 11936 : 15 - 39
  • [9] Diffusion-Enhanced PatchMatch: A Framework for Arbitrary Style Transfer with Diffusion Models
    Hamazaspyan, Mark
    Navasardyan, Shant
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 797 - 805
  • [10] Controllable Artistic Text Style Transfer via Shape-Matching GAN
    Yang, Shuai
    Wang, Zhangyang
    Wang, Zhaowen
    Xu, Ning
    Liu, Jiaying
    Guo, Zongming
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4441 - 4450