Group-of-Picture Mode Acceleration for Efficient Object Detection in Video Streams

被引:0
作者
Chen, Kuan-Hung [1 ]
机构
[1] Feng Chia Univ, Dept Elect Engn, Taichung 40724, Taiwan
关键词
Object detection; Computational modeling; Artificial intelligence; Feature extraction; Object tracking; Computer architecture; Video compression; Convolutional neural network; group of picture; mobile AI; object detection; object tracking; video processing;
D O I
10.1109/ACCESS.2023.3294558
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although modern object detection AI models have the potential to be widely used in various applications such as autonomous vehicles, these models are very computationally demanding. Using high-resolution image data further increases the computational burden. Hence, we propose an acceleration method called Group of Picture (GoP) mode for object detection in video sequences by removing the temporal redundancy, unlike the existing model compression schemes. A GoP structure is composed of only one key frame and several non-key frames. In GoP-mode, object detection is adopted for key frames only, while object tracking is employed to predict the position of each object in the following non-key frames based on the tracked trajectory and momentum of each object. By using the proposed method, the thrilling latency saving can result in multiple times of execution speed acceleration so that both high detection accuracy and high execution speed can be obtained. In theory, if we adopt a GoP structure of one key frame with N non-key frames, the execution speed of object detection is accelerated to (1+N) times by equipping the GoP-mode. The effect of the number of non-key frames on the accuracy variation of an object detector equipped with GoP-mode has been analyzed. According to the experimental results, the mean average precision (mAP) of adopting GoP mode with four non-key frames in one GoP structure is competitive to that using object detection for all frames. Meanwhile, the execution frame rate is increased from the original 8 frames per second (FPS) to 35.8 FPS on the mobile platform-Jetson Nano, i.e. a speedup of 348%.
引用
收藏
页码:71668 / 71682
页数:15
相关论文
共 34 条
  • [1] Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
  • [2] Bochinski Erik, 2017, 2017 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), DOI 10.1109/AVSS.2017.8078516
  • [3] Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934
  • [4] Cascade R-CNN: Delving into High Quality Object Detection
    Cai, Zhaowei
    Vasconcelos, Nuno
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6154 - 6162
  • [5] HarDNet: A Low Memory Traffic Network
    Chao, Ping
    Kao, Chao-Yang
    Ruan, Yu-Shan
    Huang, Chien-Hsiang
    Lin, Youn-Long
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3551 - 3560
  • [6] Block-Composed Background Reference for High Efficiency Video Coding
    Chen, Fangdong
    Li, Houqiang
    Li, Li
    Liu, Dong
    Wu, Feng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (12) : 2639 - 2651
  • [7] The Joint Exploration Model (JEM) for Video Compression With Capability Beyond HEVC
    Chen, Jianle
    Karczewicz, Marta
    Huang, Yu-Wen
    Choi, Kiho
    Ohm, Jens-Rainer
    Sullivan, Gary J.
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (05) : 1208 - 1225
  • [8] Chen KH, 2019, INT SOC DESIGN CONF, P154, DOI [10.1109/isocc47750.2019.9027682, 10.1109/ISOCC47750.2019.9027682]
  • [9] Aggregate Tracklet Appearance Features for Multi-Object Tracking
    Chen, Long
    Ai, Haizhou
    Chen, Rui
    Zhuang, Zijie
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (11) : 1613 - 1617
  • [10] Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007