Scene analysis by mid-level attribute learning using 2D LSTM networks and an application to web-image tagging

被引:12
作者
Byeon, Wonmin [1 ,2 ]
Liwicki, Marcus [1 ]
Breuel, Thomas M. [1 ]
机构
[1] Univ Kaiserslautern, D-67663 Kaiserslautern, Germany
[2] German Res Ctr Artificial Intelligence DKFI, D-67663 Kaiserslautern, Germany
关键词
Recurrent neural network; LSTM; Mid-level attribute learning; Scene analysis; Web-image tagging;
D O I
10.1016/j.patrec.2015.06.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes an approach to scene analysis based on supervised training of 2D Long Short-Term Memory recurrent neural networks (LSTM networks). Unlike previous methods, our approach requires no manual construction of feature hierarchies or incorporation of other prior knowledge. Rather, like deep learning approaches using convolutional networks, our recognition networks are trained directly on raw pixel values. However, in contrast to convolutional neural networks, our approach uses 2D LSTM networks at all levels. Our networks yield per pixel mid-level classifications of input images; since training data for such applications is not available in large numbers, we describe an approach to generating artificial training data, and then evaluate the trained networks on real-world images. Our approach performed significantly better than others methods including Convolutional Neural Networks (ConvNet), yet using two orders of magnitude fewer parameters. We further show the experiment on a recently published dataset, outdoor scene attribute dataset for fair comparisons of scene attribute learning which had significant performance improvement (ca. 21%). Finally, our approach is successfully applied on a real-world application, automatic web-image tagging. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:23 / 29
页数:7
相关论文
共 41 条
  • [1] [Anonymous], 2010, Advances in Neural Information Processing Systems (NIPS 2010)
  • [2] [Anonymous], 2012, STUDIES COMPUTATIONA, DOI DOI 10.1007/978-3-642-24797-22
  • [3] [Anonymous], 2012, NIPS
  • [4] [Anonymous], P 21 ANN C NEUR INF
  • [5] [Anonymous], P IEEE C COMP VIS PA
  • [6] [Anonymous], 2004, P 2004WORKSHOP STAT
  • [7] [Anonymous], IEEE C COMP VIS PATT
  • [8] [Anonymous], 2008, ICGTR0108 GRAZ U TEC
  • [9] [Anonymous], 2013, Caffe: An open source convolutional architecture for fast feature embedding
  • [10] Arvis V., 2004, Image Analysis & Stereology, V23, P63, DOI 10.5566/ias.v23.p63-72