Hierarchical spatial pyramid max pooling based on SIFT features and sparse coding for image classification

被引:20
作者
Han, Hong [1 ]
Han, Qiqiang [1 ]
Li, Xiaojun [1 ]
Gu, Jianyin [1 ]
机构
[1] Xidian Univ, Sch Elect Engn, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1049/iet-cvi.2012.0145
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is essential to build good image representations for many computer vision tasks. In this study, the authors propose a hierarchical spatial pyramid max pooling method based on scale-invariant feature transform ( SIFT) features and sparse coding, which builds image representations through a hierarchical network. It includes three parts: SIFT features' extraction, sparse coding and spatial pyramid max pooling. To mimic visual cortex, spatial pyramid max pooling is, firstly, performed on the original SIFT features in the image patches, which distils the features and extracts the most distinctive and significant feature, the SIFT-pooled feature, in each local patch, instead of using the original SIFT features as usual. Then, a dictionary is trained using some random SIFT-pooled features and sparse coding is performed using the trained dictionary for all SIFT-pooled features through K-singular value decomposition algorithm. Finally, on the sparse codes of all image patches, spatial pyramid max pooling is carried again on the image level. The image representations will be built by concatenating the pooling features of each level. The authors use the algorithm and simple linear support vector machine (SVM) for image classification on three datasets: Caltech-101, Caltech-256 and 15-Scenes and the experimental results show that the authors algorithm can reach a competitive performance compared with recently published results.
引用
收藏
页码:144 / 150
页数:7
相关论文
共 27 条
[1]   K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].
Aharon, Michal ;
Elad, Michael ;
Bruckstein, Alfred .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322
[2]  
[Anonymous], 2007, P 24 INT C MACH LEAR
[3]  
[Anonymous], INT C COMP VIS PATT
[4]  
Bo L., 2011, Neural Information Processing Systems, P2115
[5]  
Boiman O., 2008, INT C COMP VIS PATT
[6]  
Boureau Y.-L., 2010, P ICML 10 P 27 INT C, P111
[7]   Learning Mid-Level Features For Recognition [J].
Boureau, Y-Lan ;
Bach, Francis ;
LeCun, Yann ;
Ponce, Jean .
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :2559-2566
[8]  
Coates A., 2011, P 28 INT C MACHINE L, P921
[9]   Local Features Are Not Lonely - Laplacian Sparse Coding for Image Classification [J].
Gao, Shenghua ;
Tsang, Ivor Wai-Hung ;
Chia, Liang-Tien ;
Zhao, Peilin .
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :3555-3561
[10]  
Griffin G., 2007, Caltech-256 object category dataset