Key-Frame Extraction for Reducing Human Effort in Object Detection Training for Video Surveillance

被引：2

作者：

Sinulingga, Hagai R. ^{[1
]}

Kong, Seong G. ^{[1
]}

机构：

[1] Sejong Univ, Dept Comp Engn, Seoul 05006, South Korea

来源：

ELECTRONICS | 2023年 / 12卷 / 13期

关键词：

object detection; video surveillance; key-frame extraction; interactive labeling; deep learning; LOCALIZATION;

D O I：

10.3390/electronics12132956

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a supervised learning scheme that employs key-frame extraction to enhance the performance of pre-trained deep learning models for object detection in surveillance videos. Developing supervised deep learning models requires a significant amount of annotated video frames as training data, which demands substantial human effort for preparation. Key frames, which encompass frames containing false negative or false positive objects, can introduce diversity into the training data and contribute to model improvements. Our proposed approach focuses on detecting false negatives by leveraging the motion information within video frames that contain the detected object region. Key-frame extraction significantly reduces the human effort involved in video frame extraction. We employ interactive labeling to annotate false negative video frames with accurate bounding boxes and labels. These annotated frames are then integrated with the existing training data to create a comprehensive training dataset for subsequent training cycles. Repeating the training cycles gradually improves the object detection performance of deep learning models to monitor a new environment. Experiment results demonstrate that the proposed learning approach improves the performance of the object detection model in a new operating environment, increasing the mean average precision (mAP@0.5) from 54% to 98%. Manual annotation of key frames is reduced by 81% through the proposed key-frame extraction method.

引用

页数：14

共 24 条

[1] Weakly Supervised Deep Detection Networks
Bilen, Hakan
Vedaldi, Andrea
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2846 - 2854
[2] A COMPUTATIONAL APPROACH TO EDGE-DETECTION
CANNY, J
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1986, 8 (06) : 679 - 698
[3] Two-frame motion estimation based on polynomial expansion
Farnebäck, G
[J]. IMAGE ANALYSIS, PROCEEDINGS, 2003, 2749 : 363 - 370
[4] Fukushima K., 1982, COMPETITION COOPERAT, P267
[5] Fast R-CNN
Girshick, Ross
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
[6] Machine-learning-based top-view safety monitoring of ground workforce on complex industrial sites
Golcarenarenji, Gelayol
Martinez-Alpiste, Ignacio
Wang, Qi
Alcaraz-Calero, Jose Maria
[J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (06) : 4207 - 4220
[7] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[8] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[9] NuClick: A deep learning framework for interactive segmentation of microscopic images
Koohbanani, Navid Alemi
Jahanifar, Mostafa
Tajadin, Neda Zamani
Rajpoot, Nasir
[J]. MEDICAL IMAGE ANALYSIS, 2020, 65 (65)
[10] LaBonte T, 2022, Arxiv, DOI arXiv:2207.05205

← 1 2 3 →