Deep Self-Supervised Representation Learning for Free-Hand Sketch

被引：30

作者：

Xu, Peng ^{[1
]}

Song, Zeyu ^{[2
]}

Yin, Qiyue ^{[3
]}

Song, Yi-Zhe ^{[4
]}

Wang, Liang ^{[3
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore

[2] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China

[3] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

[4] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2021年 / 31卷 / 04期

关键词：

Feature extraction; Task analysis; Strain; Computer architecture; Deep learning; Deformable models; Convolution; Self-supervised; representation learning; deep learning; sketch; pretext task; textual convolution network; convolutional neural network;

D O I：

10.1109/TCSVT.2020.3003048

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we tackle for the first time, the problem of self-supervised representation learning for free-hand sketches. This importantly addresses a common problem faced by the sketch community - that annotated supervisory data are difficult to obtain. This problem is very challenging in which sketches are highly abstract and subject to different drawing styles, making existing solutions tailored for photos unsuitable. Key for the success of our self-supervised learning paradigm lies with our sketch-specific designs: (i) we propose a set of pretext tasks specifically designed for sketches that mimic different drawing styles, and (ii) we further exploit the use of the textual convolution network (TCN) together with the convolutional neural network (CNN) in a dual-branch architecture for sketch feature learning, as means to accommodate the sequential stroke nature of sketches. We demonstrate the superiority of our sketch-specific designs through two sketch-related applications (retrieval and recognition) on a million-scale sketch dataset, and show that the proposed approach outperforms the state-of-the-art unsupervised representation learning methods, and significantly narrows the performance gap between with supervised representation learning. (1) (1) PyTorch code of this work is available at https://github.com/zzz1515151/self-supervised_learning_sketch.

引用

页码：1503 / 1513

页数：11

共 58 条

[1]

andUrtasun Raquel, 2016, ADV NEURAL INFORM PR, P5076

[2] Style and Abstraction in Portrait Sketching [J].

Berger, Itamar ;

Shamir, Ariel ;

Mahler, Moshe ;

Carter, Elizabeth ;

Hodgins, Jessica .

ACM TRANSACTIONS ON GRAPHICS, 2013, 32 (04)

[3] Deep Clustering for Unsupervised Learning of Visual Features [J].

Caron, Mathilde ;

Bojanowski, Piotr ;

Joulin, Armand ;

Douze, Matthijs .

COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156

[4] The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification [J].

Chang, Dongliang ;

Ding, Yifeng ;

Xie, Jiyang ;

Bhunia, Ayan Kumar ;

Li, Xiaoxu ;

Ma, Zhanyu ;

Wu, Ming ;

Guo, Jun ;

Song, Yi-Zhe .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :4683-4695

[5]

Chen T, 2020, PR MACH LEARN RES, V119

[6] SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis [J].

Chen, Wengling ;

Hays, James .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9416-9425

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8] Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval [J].

Dey, Sounak ;

Riba, Pau ;

Dutta, Anjan ;

Llados, Josep ;

Song, Yi-Zhe .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2174-2183

[9] Unsupervised Visual Representation Learning by Context Prediction [J].

Doersch, Carl ;

Gupta, Abhinav ;

Efros, Alexei A. .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1422-1430

[10]

Donahue J., 2016, ARXIV160509782

← 1 2 3 4 5 6 →