GAN-Based Multi-Style Photo Cartoonization

被引：23

作者：

Shu, Yezhi ^{[1
]}

Yi, Ran ^{[1
]}

Xia, Mengfei ^{[1
]}

Ye, Zipeng ^{[1
]}

Zhao, Wang ^{[1
]}

Chen, Yang ^{[2
]}

Lai, Yu-Kun ^{[3
]}

Liu, Yong-Jin ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, MOE Key Lab Pervas Comp, BNRist, Beijing 100084, Peoples R China

[2] Tencent, Shenzhen 518057, Peoples R China

[3] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF10 3AT, Wales

来源：

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS | 2022年 / 28卷 / 10期

关键词：

Training; Generative adversarial networks; Semantics; Image edge detection; Training data; Generators; Computer architecture; Style transfer; cartoon styles; multi-style transfer; generative adversarial network;

D O I：

10.1109/TVCG.2021.3067201

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Cartoon is a common form of art in our daily life and automatic generation of cartoon images from photos is highly desirable. However, state-of-the-art single-style methods can only generate one style of cartoon images from photos and existing multi-style image style transfer methods still struggle to produce high-quality cartoon images due to their highly simplified and abstract nature. In this article, we propose a novel multi-style generative adversarial network (GAN) architecture, called MS-CartoonGAN, which can transform photos into multiple cartoon styles. MS-CartoonGAN uses only unpaired photos and cartoon images of multiple styles for training. To achieve this, we propose to use (1) a hierarchical semantic loss with sparse regularization to retain semantic content and recover flat shading in different abstract levels, (2) a new edge-promoting adversarial loss for producing fine edges, and (3) a style loss to enhance the difference between output cartoon styles and make training process more stable. We also develop a multi-domain architecture, where the generator consists of a shared encoder and multiple decoders for different cartoon styles, along with multiple discriminators for individual styles. By observing that cartoon images drawn by different artists have their unique styles while sharing some common characteristics, our shared network architecture exploits the common characteristics of cartoon styles, achieving better cartoonization and being more efficient than single-style cartoonization. We show that our multi-domain architecture can theoretically guarantee to output desired multiple cartoon styles. Through extensive experiments including a user study, we demonstrate the superiority of the proposed method, outperforming state-of-the-art single-style and multi-style image style transfer methods.

引用

页码：3376 / 3390

页数：15

共 39 条

[1] [Anonymous], 2016, P ICLR
[2] ComboGAN: Unrestrained Scalability for Image Domain Translation
Anoosheh, Asha
Agustsson, Eirikur
Timofte, Radu
Van Gool, Luc
[J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 896 - 903
[3] A COMPUTATIONAL APPROACH TO EDGE-DETECTION
CANNY, J
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1986, 8 (06) : 679 - 698
[4] CartoonGAN: Generative Adversarial Networks for Photo Cartoonization
Chen, Yang
Lai, Yu-Kun
Liu, Yong-Jin
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 9465 - 9474
[5] Chen Y, 2017, IEEE IMAGE PROC, P2010, DOI 10.1109/ICIP.2017.8296634
[6] Image-to-Image Translation via Group-wise Deep Whitening-and-Coloring Transformation
Cho, Wonwoong
Choi, Sungha
Park, David Keetae
Shin, Inkyu
Choo, Jaegul
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10631 - 10639
[7] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Choi, Yunjey
Choi, Minje
Kim, Munyoung
Ha, Jung-Woo
Kim, Sunghun
Choo, Jaegul
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8789 - 8797
[8] Dumoulin V., 2017, PROC INT C LEARN REP
[9] Gatys LA, 2015, ADV NEUR IN, V28
[10] Controlling Perceptual Factors in Neural Style Transfer
Gatys, Leon A.
Ecker, Alexander S.
Bethge, Matthias
Hertzmann, Aaron
Shechtman, Eli
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3730 - 3738

← 1 2 3 4 →