Automatic Image Annotation using Deep Learning Representations

被引:97
作者
Murthy, Venkatesh N. [1 ]
Maji, Subhransu [1 ]
Manmatha, R. [1 ]
机构
[1] Univ Massachusetts, Sch Comp Sci, Amherst, MA 01003 USA
来源
ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL | 2015年
关键词
Image annotation; deep learning; word embeddings; CCA;
D O I
10.1145/2671188.2749391
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose simple and effective models for the image annotation that make use of Convolutional Neural Network (CNN) features extracted from an image and word embedding vectors to represent their associated tags. Our first set of models is based on the Canonical Correlation Analysis (CCA) framework that helps in modeling both views visual features (CNN feature) and textual features (word embedding vectors) of the data. Results on all three variants of the CCA models, namely linear CCA, kernel CCA and CCA with k-nearest neighbor (CCA-KNN) clustering, are reported. The best results are obtained using CCA-KNN which outperforms previous results on the Corel-5k and the ESP-Game datasets and achieves comparable results on the IAPRTC-12 dataset. In our experiments we evaluate CNN features in the existing models which bring out the advantages of it over dozens of handcrafted features. We also demonstrate that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image. In addition we compare the CCA model to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.
引用
收藏
页码:603 / 606
页数:4
相关论文
共 15 条
[1]  
Ballan L, 2014, P INT C MULT RETR, P73
[2]  
Feng SL, 2004, PROC CVPR IEEE, P1002
[3]   A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics [J].
Gong, Yunchao ;
Ke, Qifa ;
Isard, Michael ;
Lazebnik, Svetlana .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 106 (02) :210-233
[4]   TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation [J].
Guillaumin, Matthieu ;
Mensink, Thomas ;
Verbeek, Jakob ;
Schmid, Cordelia .
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :309-316
[5]   Canonical correlation analysis: An overview with application to learning methods [J].
Hardoon, DR ;
Szedmak, S ;
Shawe-Taylor, J .
NEURAL COMPUTATION, 2004, 16 (12) :2639-2664
[6]  
Jeon J., 2003, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P119, DOI DOI 10.1145/860435.860459
[7]  
JIA Y, 2014, P 22 ACM INT C MULT, DOI [DOI 10.1145/2647868.2654889, 10.1145/2647868.2654889]
[8]  
Kalayeh M. M., 2014, CVPR 14
[9]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[10]  
Makadia A, 2008, LECT NOTES COMPUT SC, V5304, P316, DOI 10.1007/978-3-540-88690-7_24