UNBALANCED LEARNING IN CONTENT-BASED IMAGE CLASSIFICATION AND RETRIEVAL

被引:2
作者
Piras, Luca [1 ]
Giacinto, Giorgio [1 ]
机构
[1] Univ Cagliari, Dept Elect & Elect Eng, I-09123 Cagliari, Italy
来源
2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010) | 2010年
关键词
Unbalanced Learning; Small Sample-Size; Artificial Pattern Injection; Image Retrieval; Image Classification; RELEVANCE FEEDBACK;
D O I
10.1109/ICME.2010.5583045
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Nowadays very large archives of digital images can be easily produced thanks to the availability of digital cameras as standalone devices, or embedded into a number of portable devices. Each personal computer is typically a repository for thousands of images, while the Internet can be seen as a very large repository. One of the most severe problems in the classification and retrieval of images from very large repositories is the very limited number of elements belonging to each semantic class compared to the number of images in the repository. As a consequence, an even smaller fraction of images per semantic class can be used as training set in a classification problem, or as a query in a content-based image retrieval problem. In this paper we propose a technique aimed at artificially increasing the number of examples in the training set in order to improve the learning capabilities, reducing the unbalance between the semantic class of interest, and all other images. The proposed approach is tailored to classification and relevance feedback techniques based on the Nearest-Neighbor paradigm. A number of new points in the feature space are created based on the available training patterns, so that they better represent the distribution of the semantic class of interest. These new points are created according to the k-NN paradigm, and take into account both relevant and non-relevant images with respect to the semantic class of interest. The proposed approach allows increasing the generalization capability of NN techniques, and mitigates the risk of classifier over-training on few patterns. Reported experiments show the effectiveness of the proposed technique in Content-Based Image Retrieval tasks, where the Nearest-Neighbor approach is used to exploit user's relevance feedback. The improvement in precision and recall gained in one feature space allows also to outperform the improvement in performances attained by combining different feature spaces.
引用
收藏
页码:36 / 41
页数:6
相关论文
共 24 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
[Anonymous], 1999, Visual Information Retrieval
[3]  
[Anonymous], 2000, Pattern Classification
[4]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[5]  
Chatzichristofis SA, 2008, LECT NOTES COMPUT SC, V5008, P312
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]   Image retrieval: Ideas, influences, and trends of the new age [J].
Datta, Ritendra ;
Joshi, Dhiraj ;
Li, Jia ;
Wang, James Z. .
ACM COMPUTING SURVEYS, 2008, 40 (02)
[8]  
DUIN RPW, 2004, WIC WINT S EINDH NET
[9]   Bayesian relevance feedback for content-based image retrieval [J].
Giacinto, G ;
Roli, F .
PATTERN RECOGNITION, 2004, 37 (07) :1499-1508
[10]  
Giacinto Giorgio, 2007, P 6 ACM INT C IM VID, P456, DOI DOI 10.1145/1282280.1282347