Deep ranking structural support vector machine for image tagging

被引:10
作者
Chen, Gang [1 ]
Xu, Ran [1 ]
Yang, Zhi [1 ]
机构
[1] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
关键词
Image tagging; Maximum margin learning; Deep learning; Ranking;
D O I
10.1016/j.patrec.2017.09.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image tagging is an active research topic in computer vision and machine learning, due to its wide applications on semantic search and image retrieval. Although recent approaches based on deep neural networks can learn better representations to significantly boost performance, no structural information has been exploited. For example, "sky" and "ground" both have higher probability to appear simultaneously in the same outdoor scene. In this paper, we propose a deep ranking structural Support Vector Machine (RSSVM) for image tagging. We exploit deep learning for representation learning and propose a new ranking function over the learned features with label correlation. Specifically, we incorporate the global context information between labels into our ranking function and then formulate the multi-labeling problem as a ranking problem to handle structured output prediction. We transfer parameters from the existed convolutional neural network (CNN) model, and add additional two fully connected layers to build our deep neural structure. We evaluate our method on three widely used datasets, and show promising results over competitive baselines. (c) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:30 / 38
页数:9
相关论文
共 47 条
[31]  
Lazebnik S., COMPUTER VISION PATT, V2, P2169
[32]   Backpropagation Applied to Handwritten Zip Code Recognition [J].
LeCun, Y. ;
Boser, B. ;
Denker, J. S. ;
Henderson, D. ;
Howard, R. E. ;
Hubbard, W. ;
Jackel, L. D. .
NEURAL COMPUTATION, 1989, 1 (04) :541-551
[33]   Distinctive image features from scale-invariant keypoints [J].
Lowe, DG .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 60 (02) :91-110
[34]  
Maron O., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P341
[35]   Modeling the shape of the scene: A holistic representation of the spatial envelope [J].
Oliva, A ;
Torralba, A .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2001, 42 (03) :145-175
[36]  
Platt JC, 2000, ADV NEUR IN, P61
[37]  
Pliakos Konstantinos, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P6894, DOI 10.1109/ICASSP.2014.6854936
[38]   TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context [J].
Shotton, Jamie ;
Winn, John ;
Rother, Carsten ;
Criminisi, Antonio .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2009, 81 (01) :2-23
[39]  
Tighe J, 2010, LECT NOTES COMPUT SC, V6315, P352, DOI 10.1007/978-3-642-15555-0_26
[40]  
Tsochantaridis Ioannis., 2004, PROC 21 INT C MACHIN, P104