Crowdsourced Time-sync Video Tagging using Temporal and Personalized Topic Modeling

被引：45

作者：

Wu, Bin ^{[1
]}

Zhong, Erheng ^{[1
]}

Tan, Ben ^{[1
]}

Horner, Andrew ^{[1
]}

Yang, Qiang ^{[1
,2
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Huawei, Noahs Ark Lab, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14) | 2014年

关键词：

Video tagging; crowdsourcing; topic modeling; temporal and personalized model;

D O I：

10.1145/2623330.2623625

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Time-sync video tagging aims to automatically generate tags for each video shot. It can improve the user's experience in previewing a video's timeline structure compared to traditional schemes that tag an entire video clip. In this paper, we propose a new application which extracts time-sync video tags by automatically exploiting crowdsourced comments from video websites such as Nico Nico Douga, where videos are commented on by online crowd users in a time-sync manner. The challenge of the proposed application is that users with bias interact with one another frequently and bring noise into the data, while the comments are too sparse to compensate for the noise. Previous techniques are unable to handle this task well as they consider video semantics independently, which may overfit the sparse comments in each shot and thus fail to provide accurate modeling. To resolve these issues, we propose a novel temporal and personalized topic model that jointly considers temporal dependencies between video semantics, users' interaction in commenting, and users' preferences as prior knowledge. Our proposed model shares knowledge across video shots via users to enrich the short comments, and peels off user interaction and user bias to solve the noisy-comment problem. Log-likelihood analyses and user studies on large datasets show that the proposed model outperforms several state-of-the-art baselines in video tagging quality. Case studies also demonstrate our model's capability of extracting tags from the crowdsourced short and noisy comments.

引用

页码：721 / 730

页数：10

共 25 条

[1] [Anonymous], 2009, P 3 ACM C RECOMMENDE, DOI DOI 10.1145/1639714.1639726
[2] [Anonymous], 2013, White Paper
[3] [Anonymous], 2013, YOUTUBE STAT
[4] Bachrach Yoram, 2012, ARXIV12066386
[5] Bak D.K. JinYeong., 2012, NIPS Workshop on Big Learning, P1
[6] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[7] Chakrabarti D., 2011, ICWSM
[8] Tagging Webcast Text in Baseball Videos by Video Segmentation and Text Alignment
Chiu, Chih-Yi
Lin, Po-Chih
Li, Sheng-Yang
Tsai, Tsung-Han
Tsai, Yu-Lung
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2012, 22 (07) : 999 - 1013
[9] Das A, 2013, 19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), P500
[10] Using Social Networking and Collections to Enable Video Semantics Acquisition
Davis, Stephen J.
Ritz, Christian H.
Burnett, Ian S.
[J]. IEEE MULTIMEDIA, 2009, 16 (04) : 52 - 60

← 1 2 3 →