Key Frame Extraction in the Summary Space

被引：37

作者：

Li, Xuelong ^{[1
]}

Zhao, Bin ^{[2
]}

Lu, Xiaoqiang ^{[1
]}

机构：

[1] Chinese Acad Sci, Ctr Opt Imagery Anal & Learning, Xian Inst Opt & Precis Mech, Xian 710119, Shaanxi, Peoples R China

[2] Northwestern Polytech Univ, Ctr Opt Imagery Anal & Learning, Xian 710072, Shaanxi, Peoples R China

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2018年 / 48卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Diverse; key frame; representative; summary space; VIDEOS;

D O I：

10.1109/TCYB.2017.2718579

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Key frame extraction is an efficient way to create the video summary which helps users obtain a quick comprehension of the video content. Generally, the key frames should be representative of the video content, meanwhile, diverse to reduce the redundancy. Based on the assumption that the video data are near a subspace of a high-dimensional space, a new approach, named as key frame extraction in the summary space, is proposed for key frame extraction in this paper. The proposed approach aims to find the representative frames of the video and filter out similar frames from the representative frame set. First of all, the video data are mapped to a high-dimensional space, named as summary space. Then, a new representation is learned for each frame by analyzing the intrinsic structure of the summary space. Specifically, the learned representation can reflect the representativeness of the frame, and is utilized to select representative frames. Next, the perceptual hash algorithm is employed to measure the similarity of representative frames. As a result, the key frame set is obtained after filtering out similar frames from the representative frame set. Finally, the video summary is constructed by assigning the key frames in temporal order. Additionally, the ground truth, created by filtering out similar frames from human-created summaries, is utilized to evaluate the quality of the video summary. Compared with several traditional approaches, the experimental results on 80 videos from two datasets indicate the superior performance of our approach.

引用

页码：1923 / 1934

页数：12

共 54 条

[1] VISON: Video Summarization for ONline applications [J].

Almeida, Jurandy ;

Leite, Neucimar J. ;

Torres, Ricardo da S. .

PATTERN RECOGNITION LETTERS, 2012, 33 (04) :397-409

[2]

Aner A, 2002, LECT NOTES COMPUT SC, V2353, P388

[3]

[Anonymous], 2015, ARXIV150801667

[4]

Aoyagi S., 2003, Proceedings of the SPIE - The International Society for Optical Engineering, V5305, P178, DOI 10.1117/12.538808

[5] Practical Detection of Spammers and Content Promoters in Online Video Sharing Systems [J].

Benevenuto, Fabricio ;

Rodrigues, Tiago ;

Veloso, Adriano ;

Almeida, Jussara ;

Goncalves, Marcos ;

Almeida, Virgilio .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (03) :688-701

[6] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[7] Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection [J].

Cong, Yang ;

Yuan, Junsong ;

Luo, Jiebo .

IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (01) :66-75

[8]

Duchi J., 2008, P 25 INT C MACH LEAR, P272, DOI DOI 10.1145/1390156.1390191

[9] Adaptive key frame extraction for video summarization using an aggregation mechanism [J].

Ejaz, Naveed ;

Bin Tariq, Tayyab ;

Baik, Sung Wook .

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2012, 23 (07) :1031-1040

[10] Automatic soccer video analysis and summarization [J].

Ekin, A ;

Tekalp, AM ;

Mehrotra, R .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2003, 12 (07) :796-807

← 1 2 3 4 5 6 →