Key-Frame Extraction for Reducing Human Effort in Object Detection Training for Video Surveillance

被引:2
作者
Sinulingga, Hagai R. [1 ]
Kong, Seong G. [1 ]
机构
[1] Sejong Univ, Dept Comp Engn, Seoul 05006, South Korea
关键词
object detection; video surveillance; key-frame extraction; interactive labeling; deep learning; LOCALIZATION;
D O I
10.3390/electronics12132956
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a supervised learning scheme that employs key-frame extraction to enhance the performance of pre-trained deep learning models for object detection in surveillance videos. Developing supervised deep learning models requires a significant amount of annotated video frames as training data, which demands substantial human effort for preparation. Key frames, which encompass frames containing false negative or false positive objects, can introduce diversity into the training data and contribute to model improvements. Our proposed approach focuses on detecting false negatives by leveraging the motion information within video frames that contain the detected object region. Key-frame extraction significantly reduces the human effort involved in video frame extraction. We employ interactive labeling to annotate false negative video frames with accurate bounding boxes and labels. These annotated frames are then integrated with the existing training data to create a comprehensive training dataset for subsequent training cycles. Repeating the training cycles gradually improves the object detection performance of deep learning models to monitor a new environment. Experiment results demonstrate that the proposed learning approach improves the performance of the object detection model in a new operating environment, increasing the mean average precision (mAP@0.5) from 54% to 98%. Manual annotation of key frames is reduced by 81% through the proposed key-frame extraction method.
引用
收藏
页数:14
相关论文
共 24 条
  • [1] Weakly Supervised Deep Detection Networks
    Bilen, Hakan
    Vedaldi, Andrea
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2846 - 2854
  • [3] Two-frame motion estimation based on polynomial expansion
    Farnebäck, G
    [J]. IMAGE ANALYSIS, PROCEEDINGS, 2003, 2749 : 363 - 370
  • [4] Fukushima K., 1982, COMPETITION COOPERAT, P267
  • [5] Fast R-CNN
    Girshick, Ross
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
  • [6] Machine-learning-based top-view safety monitoring of ground workforce on complex industrial sites
    Golcarenarenji, Gelayol
    Martinez-Alpiste, Ignacio
    Wang, Qi
    Alcaraz-Calero, Jose Maria
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (06) : 4207 - 4220
  • [7] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
  • [8] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [9] NuClick: A deep learning framework for interactive segmentation of microscopic images
    Koohbanani, Navid Alemi
    Jahanifar, Mostafa
    Tajadin, Neda Zamani
    Rajpoot, Nasir
    [J]. MEDICAL IMAGE ANALYSIS, 2020, 65 (65)
  • [10] LaBonte T, 2022, Arxiv, DOI arXiv:2207.05205