DoodleFormer: Creative Sketch Drawing with Transformers

被引：11

作者：

Bhunia, Ankan Kumar ^{[1
]}

Khan, Salman ^{[1
,2
]}

Cholakkal, Hisham ^{[1
]}

Anwer, Rao Muhammad ^{[1
,3
]}

Khan, Fahad Shahbaz ^{[1
,4
]}

Laaksonen, Jorma ^{[3
]}

Felsberg, Michael ^{[4
]}

机构：

[1] Mohamed bin Zayed Univ AI, Abu Dhabi, U Arab Emirates

[2] Australian Natl Univ, Canberra, ACT, Australia

[3] Aalto Univ, Espoo, Finland

[4] Linkoping Univ, Linkoping, Sweden

来源：

COMPUTER VISION - ECCV 2022, PT XVII | 2022年 / 13677卷

关键词：

D O I：

10.1007/978-3-031-19790-1_21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects. Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch. We introduce graph-aware transformer encoders that effectively capture global dynamic as well as local static structural relations among different body parts. To ensure diversity of the generated creative sketches, we introduce a probabilistic coarse sketch decoder that explicitly models the variations of each sketch body part to be drawn. Experiments are performed on two creative sketch datasets: Creative Birds and Creative Creatures. Our qualitative, quantitative and human-based evaluations show that DoodleFormer outperforms the state-of-the-art on both datasets, yielding realistic and diverse creative sketches. On Creative Creatures, DoodleFormer achieves an absolute gain of 25 in Frechet inception distance (FID) over state-of-the-art. We also demonstrate the effectiveness of DoodleFormer for related applications of text to creative sketch generation, sketch completion and house layout generation. Code is available at: https://github.com/ ankanbhunia/doodleformer.

引用

页码：338 / 355

页数：18

共 39 条

[1]

Abu-Aisheh Zeina, 2015, 4th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015). Proceedings, P271

[2]

Bishop C.M., 1994, Mixture density networks

[3]

Cao N., 2019, AAAI

[4] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[5]

Chen YJ, 2017, Arxiv, DOI arXiv:1709.04121

[6]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[7]

Ge S., 2021, ICLR

[8]

Graves A, 2014, Arxiv, DOI arXiv:1308.0850

[9]

Ha D., 2018, INT C LEARNING REPRE

[10] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

← 1 2 3 4 →