Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

被引：14

作者：

Ghielmetti, Nicolo ^{[1
,8
]}

Loncar, Vladimir ^{[1
,9
]}

Pierini, Maurizio ^{[1
]}

Roed, Marcel ^{[1
,10
]}

Summers, Sioni ^{[1
]}

Aarrestad, Thea ^{[2
]}

Petersson, Christoffer ^{[3
,11
]}

Linander, Hampus ^{[4
]}

Ngadiuba, Jennifer ^{[5
]}

Lin, Kelvin ^{[6
,12
]}

Harris, Philip ^{[7
]}

机构：

[1] European Org Nucl Res CERN, CH-1211 Geneva 23, Switzerland

[2] Swiss Fed Inst Technol, Inst Particle Phys & Astrophys, CH-8093 Zurich, Switzerland

[3] Zenseact, S-41756 Gothenburg, Sweden

[4] Univ Gothenburg, S-40530 Gothenburg, Sweden

[5] Fermilab Natl Accelerator Lab, Batavia, IL 60510 USA

[6] Univ Washington, Seattle, WA 98195 USA

[7] MIT, Cambridge, MA 02139 USA

[8] Politecn Milan, Milan, Italy

[9] Inst Phys Belgrade, Belgrade, Serbia

[10] Univ Oxford, Oxford, England

[11] Chalmers Univ Technol, Gothenburg, Sweden

[12] Amazon, Seattle, WA USA

来源：

MACHINE LEARNING-SCIENCE AND TECHNOLOGY | 2022年 / 3卷 / 04期

基金：

欧洲研究理事会;

关键词：

FPGA; computer vision; deep learning; hls4ml; machine learning; autonomous vehicles; semantic segmentation;

D O I：

10.1088/2632-2153/ac9cb5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx ZCU102 evaluation board. The latency is reduced to 3 ms per image when increasing the batch size to ten, corresponding to the use case where the autonomous vehicle receives inputs from multiple cameras simultaneously. We show, through aggressive filter reduction and heterogeneous quantization-aware training, and an optimized implementation of convolutional layers, that the power consumption and resource utilization can be significantly reduced while maintaining accuracy on the Cityscapes dataset.

引用

页数：10

共 23 条

[1] Fast convolutional neural networks on FPGAs with hls4ml [J].

Aarrestad, Thea ;

Loncar, Vladimir ;

Ghielmetti, Nicolo ;

Pierini, Maurizio ;

Summers, Sioni ;

Ngadiuba, Jennifer ;

Petersson, Christoffer ;

Linander, Hampus ;

Iiyama, Yutaro ;

Di Guglielmo, Giuseppe ;

Duarte, Javier ;

Harris, Philip ;

Rankin, Dylan ;

Jindariani, Sergo ;

Pedro, Kevin ;

Nhan Tran ;

Liu, Mia ;

Kreinar, Edward ;

Wu, Zhenbin ;

Hoang, Duc .

MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2021, 2 (04)

[2]

[Anonymous], ZCU102 EV BOARD

[3]

[Anonymous], 2009, P 26 ANN INT C MACH

[4]

Apollinari G, 2017, High-Luminosity Large Hadron Collider (HL-LHC): Technical Design Report, DOI [10.23731/CYRM-2020-0010, DOI 10.23731/CYRM-2017-004, DOI 10.23731/CYRM-2020-0010]

[5]

Banbury Colby R., 2020, arXiv

[6]

Coelho C., 2019, QKeras

[7] Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors [J].

Coelho, Claudionor N., Jr. ;

Kuusela, Aki ;

Li, Shan ;

Zhuang, Hao ;

Ngadiuba, Jennifer ;

Aarrestad, Thea Klaeboe ;

Loncar, Vladimir ;

Pierini, Maurizio ;

Pol, Adrian Alan ;

Summers, Sioni .

NATURE MACHINE INTELLIGENCE, 2021, 3 (08) :675-+

[8]

Cordts M, 2016, Arxiv, DOI [arXiv:1604.01685, DOI 10.48550/ARXIV.1604.01685]

[9] Fast inference of deep neural networks in FPGAs for particle physics [J].

Duarte, J. ;

Han, S. ;

Harris, P. ;

Jindariani, S. ;

Kreinar, E. ;

Kreis, B. ;

Ngadiuba, J. ;

Pierini, M. ;

Rivera, R. ;

Tran, N. ;

Wu, Z. .

JOURNAL OF INSTRUMENTATION, 2018, 13

[10]

Fahim F., 2021, TINYML RES S 2021

← 1 2 3 →