Benchmarking Deep Learning Frameworks and Investigating FPGA Deployment for Traffic Sign Classification and Detection

被引:18
作者
Lin, Zhongyi [1 ]
Yih, Matthew [1 ]
Ota, Jeffrey M. [2 ]
Owens, John D. [1 ]
Muyan-Ozcelik, Pinar [3 ]
机构
[1] Univ Calif Davis, Dept Elect & Comp Engn, Davis, CA 95616 USA
[2] Intel Labs, Autonomous Driving & Sports Res Grp, Santa Clara, CA 95054 USA
[3] Calif State Univ Sacramento, Dept Comp Sci, Sacramento, CA 95819 USA
来源
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES | 2019年 / 4卷 / 03期
关键词
Computational and artificial intelligent; field programmable gate arrays (FPGAs); image processing; intelligent vehicles; machine learning;
D O I
10.1109/TIV.2019.2919458
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We benchmark several widely-used deep learning frameworks and investigate the field programmable gate array (FPGA) deployment for performing traffic sign classification and detection. We evaluate the training speed and inference accuracy of these frameworks on the graphics processing unit (GPU) by training FPGA-deployment-suitable models with various input sizes on German Traffic Sign Recognition Benchmark (GTSRB), a traffic sign classification dataset. Then, selected trained classification models and various object detection models that we train on GTSRB's detection counterpart (i.e., German Traffic Sign Detection Benchmark) are evaluated with inference speed, accuracy, and FPGA power efficiency by varying different parameters such as floating-point precisions, batch sizes, etc. We discover that Neon and MXNet deliver the best training speed and classification accuracy on the GPU in general for all test cases, while TensorFlow is always among the frameworks with the highest inference accuracies. We observe that with the current OpenVINO release, the performance of lightweight models (e.g., MobileNet-v1-SSD, etc.) usually exceeds the requirement of real-time detection without losing much accuracy, while other models (e.g., VGG-SSD, ResNet-50-SSD) generally fail to do so. We also demonstrate that we can adjust the precision of bitstreams and the batch sizes to balance inference speed and accuracy of the applications deployed on the FPGA. Finally, we show that for all test cases, the FPGA always achieves higher power efficiency than the GPU.
引用
收藏
页码:385 / 395
页数:11
相关论文
共 28 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Amazon, MXNETS OFF PAG
[3]  
[Anonymous], ARXIV161102450
[4]  
[Anonymous], 2018, ABS180104381
[5]   High Prevalence of Assisted Injection Among Street-Involved Youth in a Canadian Setting [J].
Cheng, Tessa ;
Kerr, Thomas ;
Small, Will ;
Dong, Huiru ;
Montaner, Julio ;
Wood, Evan ;
DeBeck, Kora .
AIDS AND BEHAVIOR, 2016, 20 (02) :377-384
[6]  
chuanqi305, MOBILENET V1 SSD
[7]  
Ciresan D, 2011, 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), P1918, DOI 10.1109/IJCNN.2011.6033458
[8]  
Facebook AI Research, PYTORCHS OFF PAG
[9]  
Google, TENSORFLOWS OFF PAG
[10]  
Han S., 2016, INT C LEARN REPR ICL