Discovering Video Clusters from Visual Features and Noisy Tags

被引:0
|
作者
Vahdat, Arash [1 ]
Zhou, Guang-Tong [1 ]
Mori, Greg [1 ]
机构
[1] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
来源
COMPUTER VISION - ECCV 2014, PT VI | 2014年 / 8694卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an algorithm for automatically clustering tagged videos. Collections of tagged videos are commonplace, however, it is not trivial to discover video clusters therein. Direct methods that operate on visual features ignore the regularly available, valuable source of tag information. Solely clustering videos on these tags is error-prone since the tags are typically noisy. To address these problems, we develop a structured model that considers the interaction between visual features, video tags and video clusters. We model tags from visual features, and correct noisy tags by checking visual appearance consistency. In the end, videos are clustered from the refined tags as well as the visual features. We learn the clustering through a max-margin framework, and demonstrate empirically that this algorithm can produce more accurate clustering results than baseline methods based on tags or visual features, or both. Further, qualitative results verify that the clustering results can discover sub-categories and more specific instances of a given video category.
引用
收藏
页码:526 / 539
页数:14
相关论文
共 50 条
  • [1] Discovering popular and persistent tags from YouTube trending video big dataset
    Yesim Dokuz
    Multimedia Tools and Applications, 2024, 83 : 10779 - 10797
  • [2] Discovering popular and persistent tags from YouTube trending video big dataset
    Dokuz, Yesim
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 10779 - 10797
  • [3] Discovering visual concept structure with sparse and incomplete tags
    Wang, Jingya
    Zhu, Xiatian
    Gong, Shaogang
    ARTIFICIAL INTELLIGENCE, 2017, 250 : 16 - 36
  • [4] MULTIMODAL PLSA ON VISUAL FEATURES AND TAGS
    Romberg, Stefan
    Hoerster, Eva
    Lienhart, Rainer
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 414 - 417
  • [5] Coding Visual Features Extracted From Video Sequences
    Baroffio, Luca
    Cesana, Matteo
    Redondi, Alessandro
    Tagliasacchi, Marco
    Tubaro, Stefano
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (05) : 2262 - 2276
  • [6] Tags2Parts: Discovering Semantic Regions from Shape Tags
    Muralikrishnan, Sanjeev
    Kim, Vladimir G.
    Chaudhuri, Siddhartha
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2926 - 2935
  • [7] OBSERVER EFFICIENCY FOR FEATURES ON NOISY VISUAL IMAGES
    SWENSSON, RG
    JUDY, PF
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1986, 24 (05) : 342 - 342
  • [8] Discovering joint audio–visual codewords for video event detection
    I-Hong Jhuo
    Guangnan Ye
    Shenghua Gao
    Dong Liu
    Yu-Gang Jiang
    D. T. Lee
    Shih-Fu Chang
    Machine Vision and Applications, 2014, 25 : 33 - 47
  • [9] CODING VIDEO SEQUENCES OF VISUAL FEATURES
    Baroffio, Luca
    Cesana, Matteo
    Redondi, Alessandro
    Tubaro, Stefano
    Tagliasacchi, Marco
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 1895 - 1899
  • [10] Video Summarization with Visual and Semantic Features
    Dong, Pei
    Wang, Zhiyong
    Zhuo, Li
    Feng, Dagan
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING-PCM 2010, PT I, 2010, 6297 : 203 - +