Unsupervised mining of visually consistent shots for sports genre categorization over large-scale database

被引:1
作者
Dong, Yuan [1 ]
Zhao, Nan [1 ]
Lian, Shiguo [3 ]
Cen, Shusheng [1 ]
Liu, Wei [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China
[2] France Telecom Res & Dev Orange Lab Beijing, Beijing 100190, Peoples R China
[3] Huawei Cent Res Inst, Beijing 100086, Peoples R China
基金
国家高技术研究发展计划(863计划); 中国国家自然科学基金;
关键词
Video semantic content analysis; Video summarization; Sports genre categorization; CLASSIFICATION; RETRIEVAL; FEATURES; SCHEME;
D O I
10.1007/s11235-014-9943-y
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In this paper, an algorithm is proposed to summarize sports videos based on viewpoints in TV broadcasts for sports genre classification. The redundancy of multiple views is one of the principal limitations in sports genre classification. In order to remove the redundancy, the algorithm chooses the most representative subset of shots from each game. After videos are broken into shots, single keyframe is utilized to represent each shot and uniform LBP feature is extracted to represent each keyframe. Agglomerative hierarchical clustering is then performed to cluster these keyframes. In this step, an energy-based function for clusters is introduced to match the statistical distribution of various views, and a refined distance metric is proposed as similarity measure of two shots. We modify the energy function to meet the fact that temporally neighbored shots with similar duration are more likely to be in the same views. To make full use of the high overlap of selected key-frames subset, sparse coding and geometry visual phrase are introduced in the sports genre categorization part. Our method is evaluated on videos recorded from Orangesports, ESPN and Eurosport TV broadcast. The average accuracy over 10 sports reaches 87.5%. The proposed algorithm is already applied in the Orange TV video content delinearization service platform.
引用
收藏
页码:381 / 391
页数:11
相关论文
共 27 条
[1]   Scene classification using a hybrid generative/discriminative approach [J].
Bosch, Anna ;
Zisserman, Andrew ;
Munoz, Xavier .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (04) :712-727
[2]  
Dong Y., 2011, FRANCE TELECOM ORANG
[3]  
Dong Y, 2012, 2012 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)
[4]   Performance evaluation of early and late fusion methods for generic semantics indexing [J].
Dong, Yuan ;
Gao, Shan ;
Tao, Kun ;
Liu, Jiqing ;
Wang, Haila .
PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (01) :37-50
[5]   Advanced news video parsing via visual characteristics of anchorperson scenes [J].
Dong, Yuan ;
Qin, Gang ;
Xiao, Guorui ;
Lian, Shiguo ;
Chang, Xiaofu .
TELECOMMUNICATION SYSTEMS, 2013, 54 (03) :247-263
[6]  
Dong Y, 2012, CHINA COMMUN, V9, P105
[7]   Automatic and fast temporal segmentation for personalized news consuming [J].
Dong, Yuan ;
Lian, Shiguo .
INFORMATION SYSTEMS FRONTIERS, 2012, 14 (03) :517-526
[8]   A unified framework for semantic shot classification in sports video [J].
Duan, LY ;
Xu, M ;
Tian, Q ;
Xu, CS ;
Jin, JS .
IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (06) :1066-1083
[9]   Automatic soccer video analysis and summarization [J].
Ekin, A ;
Tekalp, AM ;
Mehrotra, R .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2003, 12 (07) :796-807
[10]  
Jaser E, 2004, PROC CVPR IEEE, P908