A data-driven approach for tag refinement and localization in web videos

被引:11
作者
Ballan, Lamberto [1 ,3 ]
Bertini, Marco [1 ]
Serra, Giuseppe [2 ]
Del Bimbo, Alberto [1 ]
机构
[1] Univ Florence, MICC, I-50134 Florence, Italy
[2] Univ Modena & Reggio Emilia, Dipartimento Ingn Enzo Ferrari, I-41125 Modena, Italy
[3] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
Video tagging; Web video; Tag refinement; Tag localization; Social media; Data-driven; Lazy learning; CONCEPT DETECTORS; PROPAGATION; IMAGE; CATEGORIES; RELEVANCE;
D O I
10.1016/j.cviu.2015.05.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tagging of visual content is becoming more and more widespread as web-based services and social networks have popularized tagging functionalities among their users. These user-generated tags are used to ease browsing and exploration of media collections, e.g. using tag clouds, or to retrieve multimedia content. However, not all media are equally tagged by users. Using the current systems is easy to tag a single photo, and even tagging a part of a photo, like a face, has become common in sites like Flickr and Facebook. On the other hand, tagging a video sequence is more complicated and time consuming, so that users just tag the overall content of a video. In this paper we present a method for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to keyframes. Our approach exploits collective knowledge embedded in user-generated tags and web sources, and visual similarity of keyframes and images uploaded to social sites like YouTube and Flickr, as well as web sources like Google and Bing. Given a keyframe, our method is able to select "on the fly" from these visual sources the training exemplars that should be the most relevant for this test sample, and proceeds to transfer labels across similar images. Compared to existing video tagging approaches that require training classifiers for each tag, our system has few parameters, is easy to implement and can deal with an open vocabulary scenario. We demonstrate the approach on tag refinement and localization on DUT-WEBV, a large dataset of web videos, and show state-of-the-art results. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:58 / 67
页数:10
相关论文
共 54 条
  • [1] [Anonymous], 2008, WWW
  • [2] [Anonymous], P NEUR INF PROC SYST
  • [3] [Anonymous], 2006, P 8 ACM SIGMM INT WO, DOI DOI 10.1145/1178677.1178722
  • [4] [Anonymous], ACM SIGGRAPH T GRAPH
  • [5] [Anonymous], 2011, PhD Thesis, DOI DOI 10.1109/ICEMS.2011.6073664
  • [6] [Anonymous], LECT NOTES COMPUTER
  • [7] [Anonymous], P ACM MULT SCOTTSD A
  • [8] [Anonymous], ARXIV150308248
  • [9] [Anonymous], 2014, P INT C MULTIMEDIA R
  • [10] Data-driven approaches for social image and video tagging
    Ballan, Lamberto
    Bertini, Marco
    Uricchio, Tiberio
    Del Bimbo, Alberto
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (04) : 1443 - 1468