What Does Classifying More Than 10,000 Image Categories Tell Us?

被引:0
作者
Deng, Jia [1 ,3 ]
Berg, Alexander C. [2 ]
Li, Kai [1 ]
Li Fei-Fei [3 ]
机构
[1] Princeton Univ, Princeton, NJ 08544 USA
[2] Columbia Univ, New York, NY 10027 USA
[3] Stanford Univ, Stanford, CA 94305 USA
来源
COMPUTER VISION-ECCV 2010, PT V | 2010年 / 6315卷
基金
美国国家科学基金会;
关键词
OBJECT; SCENE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This paper presents a study of large scale categorization including a series of challenging experiments on classification with more than 10,000 image classes. We find that a) computational issues become crucial in algorithm design; b) conventional wisdom from a couple of hundred image categories on relative performance of different classifiers does not necessarily hold when the number of categories increases; c) there is a surprisingly strong relationship between the structure of Word Net (developed for studying language) and the difficulty of visual categorization; d) classification can be improved by exploiting the semantic hierarchy. Toward the future goal of developing automatic vision algorithms to recognize tens of thousands or even millions of image categories, we make a series of observations and arguments about dataset scale, category density, and image hierarchy.
引用
收藏
页码:71 / +
页数:4
相关论文
共 42 条
[1]  
Andoni A, 2006, ANN IEEE SYMP FOUND, P459
[2]  
[Anonymous], 2007, 2007 IEEE 11 INT C C
[3]  
[Anonymous], 2006, 2006 IEEE COMP SOC C
[4]  
[Anonymous], CVPR 2008
[5]  
[Anonymous], 2004, ECCV INT WORKSH STAT
[6]  
[Anonymous], 2007, ICCV
[7]  
[Anonymous], 2006, CVPR
[8]  
[Anonymous], 2005, ICCV
[9]  
[Anonymous], 2009, NIPS
[10]  
[Anonymous], 2008, VLFeat: An open and portable library of computer vision algorithms