Efficient visual attention based framework for extracting key frames from videos

被引:132
作者
Ejaz, Naveed [1 ]
Mehmood, Irfan [1 ]
Baik, Sung Wook [1 ]
机构
[1] Sejong Univ, Coll Elect & Informat Engn, Seoul, South Korea
关键词
Video summarization; Key frame extraction; Visual attention model; Visual saliency; RETRIEVAL;
D O I
10.1016/j.image.2012.10.002
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The huge amount of video data on the internet requires efficient video browsing and retrieval strategies. One of the viable solutions is to provide summaries of the videos in the form of key frames. The video summarization using visual attention modeling has been used of late. In such schemes, the visually salient frames are extracted as key frames on the basis of theories of human attention modeling. The visual attention modeling schemes have proved to be effective in video summarization. However, the high computational costs incurred by these techniques limit their applicability in practical scenarios. In this context, this paper proposes an efficient visual attention model based key frame extraction method. The computational cost is reduced by using the temporal gradient based dynamic visual saliency detection instead of the traditional optical flow methods. Moreover, for static visual saliency, an effective method employing discrete cosine transform has been used. The static and dynamic visual attention measures are fused by using a non-linear weighted fusion method. The experimental results indicate that the proposed method is not only efficient, but also yields high quality video summaries. (c) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:34 / 44
页数:11
相关论文
共 33 条
[1]  
Calic J, 2004, P WORKSH IM AN MULT
[2]  
Chan P. P. K., 2011, Proceedings of the 2011 International Conference on Machine Learning and Cybernetics (ICMLC 2011), P1637, DOI 10.1109/ICMLC.2011.6017035
[3]   An Autonomous Framework to Produce and Distribute Personalized Team-Sport Video Summaries: A Basketball Case Study [J].
Chen, Fan ;
Delannay, Damien ;
De Vleeschouwer, Christophe .
IEEE TRANSACTIONS ON MULTIMEDIA, 2011, 13 (06) :1381-1394
[4]   Fast human detection using a novel boosted cascading structure with meta stages [J].
Chen, Yu-Ting ;
Chen, Chu-Song .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2008, 17 (08) :1452-1464
[5]  
DeMenthon D., 1998, Proceedings ACM Multimedia 98, P211, DOI 10.1145/290747.290773
[6]  
Ejaz N., 2012, MULTIMEDIA IN PRESS
[7]   Adaptive key frame extraction for video summarization using an aggregation mechanism [J].
Ejaz, Naveed ;
Bin Tariq, Tayyab ;
Baik, Sung Wook .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2012, 23 (07) :1031-1040
[8]   VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method [J].
Fontes de Avila, Sandra Eliza ;
Brandao Lopes, Ana Paula ;
da Luz, Antonio, Jr. ;
Araujo, Arnaldo de Albuquerque .
PATTERN RECOGNITION LETTERS, 2011, 32 (01) :56-68
[9]   STIMO: STIll and MOving video storyboard for the web scenario [J].
Furini, Marco ;
Geraci, Filippo ;
Montangero, Manuela ;
Pellegrini, Marco .
MULTIMEDIA TOOLS AND APPLICATIONS, 2010, 46 (01) :47-69
[10]   On the plausibility of the discriminant center-surround hypothesis for visual saliency [J].
Gao, Dashan ;
Mahadevan, Vijay ;
Vasconcelos, Nuno .
JOURNAL OF VISION, 2008, 8 (07)