Quality-guided key frames selection from video stream based on object detection

被引：37

作者：

Chen, Mingju ^{[1
,2
]}

Han, Xiaofeng ^{[3
]}

Zhang, Hua ^{[1
]}

Lin, Guojun ^{[2
]}

Kamruzzaman, M. M. ^{[4
]}

机构：

[1] Southwest Univ Sci & Technol, Robot Technol Used Special Environm Key Lab Sichu, Mianyang 621010, Sichuan, Peoples R China

[2] Sichuan Univ Sci & Engn, Artificial Intelligence Key Lab Sichuan Prov, Zigong 643000, Peoples R China

[3] Shandong Univ Sci & Technol, Coll Math & Syst Sci, Qingdao, Shandong, Peoples R China

[4] Jouf Univ, Dept Comp & Informat Sci, Sakaka, Al Jouf, Saudi Arabia

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2019年 / 65卷

关键词：

Convolutional neural network; Key frame; Object detection; SIFT characteristics; Quality assessment model; KEYFRAME EXTRACTION; DESIGN; STATE;

D O I：

10.1016/j.jvcir.2019.102678

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Object detection technique is widely applied in modern intelligent systems, such as pedestrian tracking, video surveillance. Key frames selection aims to select more informative frames and reduce amount of redundant information frames. Traditional methods leveraged SIFT feature, which have high key frame selection error rate. In this paper, we propose a novel key frames selection method based on object detection and image quality. Specifically, we first leverage object detector to detect object, such as pedestrian, vehicles. Then, each training frame will be assigned with a quality score, where frames contain objects have high quality score. Afterwards, we leverage CNN based AlexNet architecture for deep feature representation extraction. Our algorithm combines mutual information entropy and SURF image local features to extract key frames. Comprehensive experiments verify the feasibility of practicing the key frame extractor based on convolutional neural network by training the model, and conduct a quality assessment model study. (C) 2019 Elsevier Inc. All rights reserved.

引用

页数：7

共 43 条

[1] SRLibrary: Comparing different loss functions for super-resolution over various convolutional architectures [J].

Anagun, Yildiray ;

Isik, Sahin ;

Seke, Erol .

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 61 :178-187

[2]

[Anonymous], 2017, J ELECTR COMPUT ENG

[3]

Cai Y., 2017, REMOTE SENS, P9

[4] ChaboNet : Design of a deep CNN for prediction of visual saliency in natural video [J].

Chaabouni, Souad ;

Benois-Pineau, Jenny ;

Ben Amar, Chokri .

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 60 :79-93

[5]

Chasanis V.T., 2015, INT C SIGN PROC, P23

[6] RPCA-KFE: Key Frame Extraction for Video Using Robust Principal Component Analysis [J].

Dang, Chinh ;

Radha, Hayder .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (11) :3742-3753

[7] Research on data stream clustering algorithms [J].

Ding, Shifei ;

Wu, Fulin ;

Qian, Jun ;

Jia, Hongjie ;

Jin, Fengxiang .

ARTIFICIAL INTELLIGENCE REVIEW, 2015, 43 (04) :593-600

[8] Classification and evaluation of timed running schemas for workflow based on process mining [J].

Duan, Hua ;

Zeng, Qingtian ;

Wang, Huaiqing ;

Sun, Sherry X. ;

Xu, Dongming .

JOURNAL OF SYSTEMS AND SOFTWARE, 2009, 82 (03) :400-410

[9]

Gao Lin Gao Lin, 2014, Transactions of the Chinese Society of Agricultural Engineering, V30, P121

[10]

Ghoniemy T, 2018, IEEE IMAGE PROC, P236, DOI 10.1109/ICIP.2018.8451757

← 1 2 3 4 5 →