Group-of-Picture Mode Acceleration for Efficient Object Detection in Video Streams

被引：0

作者：

Chen, Kuan-Hung ^{[1
]}

机构：

[1] Feng Chia Univ, Dept Elect Engn, Taichung 40724, Taiwan

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Object detection; Computational modeling; Artificial intelligence; Feature extraction; Object tracking; Computer architecture; Video compression; Convolutional neural network; group of picture; mobile AI; object detection; object tracking; video processing;

D O I：

10.1109/ACCESS.2023.3294558

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Although modern object detection AI models have the potential to be widely used in various applications such as autonomous vehicles, these models are very computationally demanding. Using high-resolution image data further increases the computational burden. Hence, we propose an acceleration method called Group of Picture (GoP) mode for object detection in video sequences by removing the temporal redundancy, unlike the existing model compression schemes. A GoP structure is composed of only one key frame and several non-key frames. In GoP-mode, object detection is adopted for key frames only, while object tracking is employed to predict the position of each object in the following non-key frames based on the tracked trajectory and momentum of each object. By using the proposed method, the thrilling latency saving can result in multiple times of execution speed acceleration so that both high detection accuracy and high execution speed can be obtained. In theory, if we adopt a GoP structure of one key frame with N non-key frames, the execution speed of object detection is accelerated to (1+N) times by equipping the GoP-mode. The effect of the number of non-key frames on the accuracy variation of an object detector equipped with GoP-mode has been analyzed. According to the experimental results, the mean average precision (mAP) of adopting GoP mode with four non-key frames in one GoP structure is competitive to that using object detection for all frames. Meanwhile, the execution frame rate is increased from the original 8 frames per second (FPS) to 35.8 FPS on the mobile platform-Jetson Nano, i.e. a speedup of 348%.

引用

页码：71668 / 71682

页数：15

共 34 条

[1] Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
[2] Bochinski Erik, 2017, 2017 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), DOI 10.1109/AVSS.2017.8078516
[3] Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934
[4] Cascade R-CNN: Delving into High Quality Object Detection
Cai, Zhaowei
Vasconcelos, Nuno
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6154 - 6162
[5] HarDNet: A Low Memory Traffic Network
Chao, Ping
Kao, Chao-Yang
Ruan, Yu-Shan
Huang, Chien-Hsiang
Lin, Youn-Long
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3551 - 3560
[6] Block-Composed Background Reference for High Efficiency Video Coding
Chen, Fangdong
Li, Houqiang
Li, Li
Liu, Dong
Wu, Feng
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (12) : 2639 - 2651
[7] The Joint Exploration Model (JEM) for Video Compression With Capability Beyond HEVC
Chen, Jianle
Karczewicz, Marta
Huang, Yu-Wen
Choi, Kiho
Ohm, Jens-Rainer
Sullivan, Gary J.
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (05) : 1208 - 1225
[8] Chen KH, 2019, INT SOC DESIGN CONF, P154, DOI [10.1109/isocc47750.2019.9027682, 10.1109/ISOCC47750.2019.9027682]
[9] Aggregate Tracklet Appearance Features for Multi-Object Tracking
Chen, Long
Ai, Haizhou
Chen, Rui
Zhuang, Zijie
[J]. IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (11) : 1613 - 1617
[10] Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007

← 1 2 3 4 →