Encouraging Disentangled and Convex Representation with Controllable Interpolation Regularization

被引：2

作者：

Ge, Yunhao ^{[1
]}

Xu, Zhi ^{[1
]}

Xiao, Yao ^{[1
]}

Xin, Gan ^{[1
]}

Pang, Yunkui ^{[1
]}

Itti, Laurent ^{[1
]}

机构：

[1] Univ Southern Calif, Los Angeles, CA 90007 USA

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年

关键词：

D O I：

10.1109/WACV56688.2023.00474

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We focus on controllable disentangled representation learning (C-Dis-RL), where users can control the partition of the disentangled latent space to factorize dataset attributes (concepts) for downstream tasks. Two general problems remain under-explored in current methods: (1) They lack comprehensive disentanglement constraints, especially missing the minimization of mutual information between different attributes across latent and observation domains. (2) They lack convexity constraints, which is important for meaningfully manipulating specific attributes for downstream tasks. To encourage both comprehensive C-Dis-RL and convexity simultaneously, we propose a simple yet efficient method: Controllable Interpolation Regularization (CIR), which creates a positive loop where disentanglement and convexity can help each other. Specifically, we conduct controlled interpolation in latent space during training, and we reuse the encoder to help form a 'perfect disentanglement' regularization. In that case, (a) disentanglement loss implicitly enlarges the potential understandable distribution to encourage convexity; (b) convexity can in turn improve robust and precise disentanglement. CIR is a general module and we merge CIR with three different algorithms: ELEGANT, I2I-Dis, and GZS-Net to show the compatibility and effectiveness. Qualitative and quantitative experiments show improvement in C-Dis-RL and latent convexity by CIR. This further improves downstream tasks: controllable image synthesis, cross-modality image translation and zero-shot synthesis.

引用

页码：4750 / 4758

页数：9

共 15 条

[1]

Belghazi MI, 2018, PR MACH LEARN RES, V80

[2]

Bengio Y., 2013, INT C MACH LEARN, P552

[3]

Chen Ricky T. Q., 2018, Advances in Neural Information Processing Systems, V31

[4]

Chen Xi, 2016, Advances in Neural Information Processing Systems, V29

[5]

Engel J, 2017, PR MACH LEARN RES, V70

[6]

Ge YH, 2020, Img Proc Comp Vis Re, V12373, P138, DOI 10.1007/978-3-030-58604-1_9

[7]

Higgins I., 2016, ICLR POSTER

[8] A Style-Based Generator Architecture for Generative Adversarial Networks [J].

Karras, Tero ;

Laine, Samuli ;

Aila, Timo .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4396-4405

[9]

Kingma D.P., 2014, 2 INT C LEARNING REP

[10] Diverse Image-to-Image Translation via Disentangled Representations [J].

Lee, Hsin-Ying ;

Tseng, Hung-Yu ;

Huang, Jia-Bin ;

Singh, Maneesh ;

Yang, Ming-Hsuan .

COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 :36-52

← 1 2 →