DoodleFormer: Creative Sketch Drawing with Transformers

被引:11
作者
Bhunia, Ankan Kumar [1 ]
Khan, Salman [1 ,2 ]
Cholakkal, Hisham [1 ]
Anwer, Rao Muhammad [1 ,3 ]
Khan, Fahad Shahbaz [1 ,4 ]
Laaksonen, Jorma [3 ]
Felsberg, Michael [4 ]
机构
[1] Mohamed bin Zayed Univ AI, Abu Dhabi, U Arab Emirates
[2] Australian Natl Univ, Canberra, ACT, Australia
[3] Aalto Univ, Espoo, Finland
[4] Linkoping Univ, Linkoping, Sweden
来源
COMPUTER VISION - ECCV 2022, PT XVII | 2022年 / 13677卷
关键词
D O I
10.1007/978-3-031-19790-1_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects. Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch. We introduce graph-aware transformer encoders that effectively capture global dynamic as well as local static structural relations among different body parts. To ensure diversity of the generated creative sketches, we introduce a probabilistic coarse sketch decoder that explicitly models the variations of each sketch body part to be drawn. Experiments are performed on two creative sketch datasets: Creative Birds and Creative Creatures. Our qualitative, quantitative and human-based evaluations show that DoodleFormer outperforms the state-of-the-art on both datasets, yielding realistic and diverse creative sketches. On Creative Creatures, DoodleFormer achieves an absolute gain of 25 in Frechet inception distance (FID) over state-of-the-art. We also demonstrate the effectiveness of DoodleFormer for related applications of text to creative sketch generation, sketch completion and house layout generation. Code is available at: https://github.com/ ankanbhunia/doodleformer.
引用
收藏
页码:338 / 355
页数:18
相关论文
共 39 条
[1]  
Abu-Aisheh Zeina, 2015, 4th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015). Proceedings, P271
[2]  
Bishop C.M., 1994, Mixture density networks
[3]  
Cao N., 2019, AAAI
[4]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[5]  
Chen YJ, 2017, Arxiv, DOI arXiv:1709.04121
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]  
Ge S., 2021, ICLR
[8]  
Graves A, 2014, Arxiv, DOI arXiv:1308.0850
[9]  
Ha D., 2018, INT C LEARNING REPRE
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778