Large-scale Tag-based Font Retrieval with Generative Feature Learning

被引:25
作者
Chen, Tianlang [1 ,3 ]
Wang, Zhaowen [2 ]
Xu, Ning [2 ]
Jin, Hailin [2 ]
Luo, Jiebo [1 ]
机构
[1] Univ Rochester, Rochester, NY 14627 USA
[2] Adobe Res, San Jose, CA USA
[3] Adobe, San Jose, CA USA
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
基金
美国国家科学基金会;
关键词
D O I
10.1109/ICCV.2019.00921
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Font selection is one of the most important steps in a design workflow. Traditional methods rely on ordered lists which require significant domain knowledge and are often difficult to use even for trained professionals. In this paper, we address the problem of large-scale tag-based font retrieval which aims to bring semantics to the font selection process and enable people without expert knowledge to use fonts effectively. We collect a large-scale font tagging dataset of high-quality professional fonts. The dataset contains nearly 20,000 fonts, 2,000 tags, and hundreds of thousands of font-tag relations. We propose a novel generative feature learning algorithm that leverages the unique characteristics of fonts. The key idea is that font images are synthetic and can therefore be controlled by the learning algorithm. We design an integrated rendering and learning process so that the visual feature from one image can be used to reconstruct another image with different text. The resulting feature captures important font design details while is robust to nuisance factors such as text. We propose a novel attention mechanism to re-weight the visual feature for joint visual-text modeling. We combine the feature and the attention mechanism in a novel recognition-retrieval model. Experimental results show that our method significantly outperforms the state-of-the-art for the important problem of large-scale tag-based font retrieval.
引用
收藏
页码:9115 / 9124
页数:10
相关论文
共 39 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]  
[Anonymous], 2017, P CVPR
[3]   Multi-Content GAN for Few-Shot Font Style Transfer [J].
Azadil, Samaneh ;
Fisher, Matthew ;
Kim, Vladimir ;
Wang, Zhaowen ;
Shechtman, Eli ;
Darrell, Trevor .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7564-7573
[4]   Large-Scale Visual Font Recognition [J].
Chen, Guang ;
Yang, Jianchao ;
Jin, Hailin ;
Brandt, Jonathan ;
Shechtman, Eli ;
Agarwala, Aseem ;
Han, Tony X. .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3598-3605
[5]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[6]  
Frome A., 2013, Neural information processing systems
[7]   Generative Adversarial Networks [J].
Goodfellow, Ian ;
Pouget-Abadie, Jean ;
Mirza, Mehdi ;
Xu, Bing ;
Warde-Farley, David ;
Ozair, Sherjil ;
Courville, Aaron ;
Bengio, Yoshua .
COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144
[8]   Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models [J].
Gu, Jiuxiang ;
Cai, Jianfei ;
Joty, Shafiq ;
Niu, Li ;
Wang, Gang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7181-7189
[9]  
He K., 2016, CVPR, DOI [10.1109/CVPR.2016.90, DOI 10.1109/CVPR.2016.90]
[10]  
Isola P., 2017, PREPRINT