Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision

被引:73
作者
Elhoseiny, Mohamed [1 ,2 ]
Zhu, Yizhe [1 ]
Zhang, Han [1 ]
Elgammal, Ahmed [1 ]
机构
[1] Rutgers State Univ, Dept Comp Sci, New Brunswick, NJ 08901 USA
[2] Facebook AI Res, New Brunswick, NJ USA
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR.2017.666
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study learning visual classifiers from unstructured text descriptions at part precision with no training images. We propose a learning framework that is able to connect text terms to its relevant parts and suppress connections to non-visual text terms without any part-text annotations. For instance, this learning process enables terms like "beak" to be sparsely linked to the visual representation of parts like head, while reduces the effect of non-visual terms like "migrate" on classifier prediction. Images are encoded by a part-based CNN that detect bird parts and learn part-specific representation. Part-based visual classifiers are predicted from text descriptions of unseen visual classifiers to facilitate classification without training images (also known as zero-shot recognition). We performed our experiments on CUBirds 2011 dataset and improves the state-of-the-art text-based zero-shot recognition results from 34.7% to 43.6%. We also created large scale benchmarks on North American Bird Images augmented with text descriptions, where we also show that our approach outperforms existing methods. Our code, data, and models are publically available link [1].
引用
收藏
页码:6288 / 6297
页数:10
相关论文
共 50 条
[1]   Label-Embedding for Image Classification [J].
Akata, Zeynep ;
Perronnin, Florent ;
Harchaoui, Zaid ;
Schmid, Cordelia .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (07) :1425-1438
[2]  
Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911
[3]   Label-Embedding for Attribute-Based Classification [J].
Akata, Zeynep ;
Perronnin, Florent ;
Harchaoui, Zaid ;
Schmid, Cordelia .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :819-826
[4]  
[Anonymous], 2016, C EMP METH NAT LANG
[5]  
[Anonymous], 2014, C EMP METH NAT LANG
[6]  
[Anonymous], 2014, T ASSOC COMPUT LING
[7]  
[Anonymous], 2011, TECHNICAL REPORT
[8]  
[Anonymous], ALL BIRDS
[9]  
[Anonymous], COMP VIS PATT REC CV
[10]  
[Anonymous], 2015, PROC CVPR IEEE