Exploiting Web Images for Semantic Video Indexing Via Robust Sample-Specific Loss

被引：106

作者：

Yang, Yang ^{[1
]}

Zha, Zheng-Jun ^{[2
]}

Gao, Yue ^{[3
]}

Zhu, Xiaofeng ^{[4
]}

Chua, Tat-Seng ^{[5
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

[2] Chinese Acad Sci, Inst Intelligent Machines, Hefei 230031, Peoples R China

[3] Tsinghua Univ, Dept Automat, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China

[4] Univ N Carolina, Biomed Res Imaging Ctr, Chapel Hill, NC 27599 USA

[5] Natl Univ Singapore, Sch Comp, Singapore 117417, Singapore

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2014年 / 16卷 / 06期

基金：

新加坡国家研究基金会;

关键词：

Robust; semantic video indexing; transfer learning; RELEVANCE; SCALE;

D O I：

10.1109/TMM.2014.2323014

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Semantic video indexing, also known as video annotation or video concept detection in literatures, has been attracting significant attention in recent years. Due to deficiency of labeled training videos, most of the existing approaches can hardly achieve satisfactory performance. In this paper, we propose a novel semantic video indexing approach, which exploits the abundant user-tagged Web images to help learn robust semantic video indexing classifiers. The following two major challenges are well studied: 1) noisy Web images with imprecise and/or incomplete tags; and 2) domain difference between images and videos. Specifically, we first apply a non-parametric approach to estimate the probabilities of images being correctly tagged as confidence scores. We then develop a robust transfer video indexing (RTVI) model to learn reliable classifiers from a limited number of training videos together with the abundance of user-tagged images. The RTVI model is equipped with a novel sample-specific robust loss function, which employs the confidence score of a Web image as prior knowledge to suppress the influence and control the contribution of this image in the learning process. Meanwhile, the RTVI model discovers an optimal kernel space, in which the mismatch between images and videos is minimized for tackling the domain difference problem. Besides, we devise an iterative algorithm to effectively optimize the proposed RTVI model and a theoretical analysis on the convergence of the proposed algorithm is provided as well. Extensive experiments on various real-world multimedia collections demonstrate the effectiveness of the proposed robust semantic video indexing approach.

引用

页码：1677 / 1689

页数：13

共 45 条

[1]

[Anonymous], 2012, OUTLIER ANAL

[2]

[Anonymous], 2011, 25 AAAI C ART INT

[3]

[Anonymous], 2014, P INT C MULT RETR

[4]

[Anonymous], INT J NEUROSCI

[5]

[Anonymous], 2007, P 24 INT C MACH LEAR

[6]

[Anonymous], 2013, P 21 ACM INT C MULT

[7]

[Anonymous], P KAIS

[8]

[Anonymous], P INT WORKSH MULT IN

[9]

[Anonymous], P MMM

[10]

[Anonymous], 2010, P ACM MULTIMEDIA

← 1 2 3 4 5 →