Differential Image-Based Scalable YOLOv7-Tiny Implementation for Clustered Embedded Systems

被引：1

作者：

Hong, Sunghoon ^{[1
]}

Park, Daejin ^{[2
]}

机构：

[1] Kyungpook Natl Univ, Dept Elect & Elect Engn, Daegu 41566, South Korea

[2] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 11期

基金：

新加坡国家研究基金会;

关键词：

Detectors; Convolution; Accuracy; Computational complexity; Real-time systems; Feature extraction; Convolutional neural networks; Classification algorithms; Graphics processing units; Embedded systems; clustered systems; deep learning; fast convolution;

D O I：

10.1109/TITS.2024.3419095

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Convolutional neural networks (CNNs) for powerful visual image analysis are gaining popularity in artificial intelligence. The main difference in CNNs compared to other artificial neural networks is that many convolutional layers are added, which improve the performance of visual image analysis by extracting the feature maps required for image classification. However, algorithm optimization is required to run applications that require low-latency in edge compute modules with limited processing resources. In this paper, we propose a novel algorithm optimization method for fast CNNs by using continuous differential images. The main idea is to reduce computation variably by using the differential value of the input in each convolutional layer. Also, the proposed method is compatible with all types of CNNs, and the performance is better when the pixel value difference of continuous images is low. We use the DarkNet framework to evaluate our algorithm using fast convolution and half convolution approaches on a clustered system. As a result, when the input frame rate is 10 fps, FLOPs are reduced by about 4.92 times compared to the original YOLOv7-tiny. By reducing the FLOPs of the convolutional layer, the inference speed increases to about 4.86 FPS, performing 1.57 times faster than the original YOLOv7-tiny. In the case of parallel processing that used two edge compute modules for using half convolution approach, FLOPs reduced more, and the response speed improved. In addition, faster Object detection implementation is possible by additionally expanding up to 7 compute modules in a scalable clustered embedded system as much as the user wants.

引用

页码：16036 / 16047

页数：12

共 39 条

[1]

Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, 10.48550/arXiv.2004.10934, DOI 10.48550/ARXIV.2004.10934]

[2]

Chellapilla K., 2006, 10 INT WORKSH FRONT

[3] Enhancing the robustness of object detection via 6G vehicular edge computing [J].

Chen, Chen ;

Yao, Guorun ;

Wang, Chenyu ;

Goudos, Sotirios ;

Wan, Shaohua .

DIGITAL COMMUNICATIONS AND NETWORKS, 2022, 8 (06) :923-931

[4] An Edge Traffic Flow Detection Scheme Based on Deep Learning in an Intelligent Transportation System [J].

Chen, Chen ;

Liu, Bin ;

Wan, Shaohua ;

Qiao, Peng ;

Pei, Qingqi .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (03) :1840-1852

[5]

Culler D. E, 1997, P 9 JOINT S PAR PROC, P136

[6] OpenMP: An industry standard API for shared-memory programming [J].

Dagum, L ;

Menon, R .

IEEE COMPUTATIONAL SCIENCE & ENGINEERING, 1998, 5 (01) :46-55

[7] Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges [J].

Feng, Di ;

Haase-Schutz, Christian ;

Rosenbaum, Lars ;

Hertlein, Heinz ;

Glaser, Claudius ;

Timm, Fabian ;

Wiesbeck, Werner ;

Dietmayer, Klaus .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (03) :1341-1360

[8]

Ge Z, 2021, Arxiv, DOI arXiv:2107.08430

[9]

Georganas E., 2018, P INT C HIGH PERFORM

[10] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

← 1 2 3 4 →