A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA

被引:60
作者
Fan, Hongxiang [1 ]
Liu, Shuanglong [1 ]
Ferianc, Martin [1 ]
Ng, Ho-Cheung [1 ]
Que, Zhiqiang [1 ]
Liu, Shen [1 ]
Niu, Xinyu [2 ]
Luk, Wayne [1 ]
机构
[1] Imperial Coll London, Sch Engn, Dept Comp, London, England
[2] Corerain Technol Ltd, Shenzhen, Peoples R China
来源
2018 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT 2018) | 2018年
基金
欧盟地平线“2020”; 英国工程与自然科学研究理事会;
关键词
D O I
10.1109/FPT.2018.00014
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Convolutional neural network (CNN)-based object detection has been widely employed in various applications such as autonomous driving and intelligent video surveillance. However, the computational complexity of conventional convolution hinders its application in embedded systems. Recently, a mobile-friendly CNN model SSDLite-MobileNetV2 (SSDLiteM2) has been proposed for object detection. This model consists of a novel layer called bottleneck residual block (BRB). Although SSDLiteM2 contains far fewer parameters and computations than conventional CNN models, its performance on embedded devices still cannot meet the requirements of real-time processing. This paper proposes a novel FPGA-based architecture for SSDLiteM2 in combination with hardware optimizations including fused BRB, processing element (PE) sharing and load-balanced channel pruning. Moreover, a novel quantization scheme called partial quantization has been developed, which partially quantizes SSDLiteM2 to 8 bits with only 1.8% accuracy loss. Experiments show that the proposed design on a Xilinx ZC706 device can achieve up to 65 frames per second with 20.3 mean average precision on the COCO dataset.
引用
收藏
页码:17 / 24
页数:8
相关论文
共 32 条
[1]  
Abadi M., 2016, TENSORFLOW LARGESCAL
[2]  
[Anonymous], 2011, 2011 INT C FIELD PRO, DOI DOI 10.1109/FPT.2011.6132679
[3]  
[Anonymous], 2005, PROC CVPR IEEE
[4]  
[Anonymous], 2015, INT C FIELD PROGR TE, P120
[5]  
[Anonymous], 2016, Lecture Notes in Computer Science, DOI [10.1007/978-3-319-46493-0_38, DOI 10.1007/978-3-319-46493-0_38]
[6]  
Chollet F., 2017, XCEPTION DEEP LEARNI
[7]  
Fan H., 2017, 2017 IEEE International Magnetics Conference (INTERMAG), DOI 10.1109/INTMAG.2017.8007987
[8]  
Fan H, 2018, 2018 NEW GENERATION OF CAS (NGCAS), P1, DOI 10.1109/NGCAS.2018.8572141
[9]  
Girshick R., 2015, P IEEE INT C COMPUTE, P1440, DOI [10.1109/ICCV.2015.169, DOI 10.1109/ICCV.2015.169]
[10]  
He Li, 2017, 2017 International Conference on Field Programmable Technology (ICFPT), P73, DOI 10.1109/FPT.2017.8280123