High-accuracy low-latency non-maximum suppression processor for traffic object detection
被引:0
作者:
Yuan, Chenbo
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Semicond, Beijing 10083, Peoples R China
Univ Chinese Acad Sci, Beijing 10089, Peoples R China
Semicond Neural Network Intelligent & Comp Techno, Beijing 100083, Peoples R ChinaChinese Acad Sci, Inst Semicond, Beijing 10083, Peoples R China
Yuan, Chenbo
[1
,2
,3
]
Xu, Peng
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Semicond, Beijing 10083, Peoples R ChinaChinese Acad Sci, Inst Semicond, Beijing 10083, Peoples R China
Xu, Peng
[1
]
Chen, Gang
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Semicond, Beijing 10083, Peoples R China
Univ Chinese Acad Sci, Beijing 10089, Peoples R China
Semicond Neural Network Intelligent & Comp Techno, Beijing 100083, Peoples R ChinaChinese Acad Sci, Inst Semicond, Beijing 10083, Peoples R China
Chen, Gang
[1
,2
,3
]
机构:
[1] Chinese Acad Sci, Inst Semicond, Beijing 10083, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 10089, Peoples R China
[3] Semicond Neural Network Intelligent & Comp Techno, Beijing 100083, Peoples R China
As autonomous driving technology advances, the requirements for object detection are becoming increasingly high. Non-maximum suppression (NMS) algorithm, as a key component in traffic object detection algorithms, is an independent post-processing process in the object detection framework. Due to the complexity of real-world road scenarios and high density of detected entities in urban traffic, the number of candidate bounding boxes generated by the neural network is large. Hence, low-precision processors may generate a significant number of redundant target bounding boxes. The excessive output of redundant target bounding boxes not only imposes a workload on subsequent processing but also has the potential to result in non-optimal decision-making. We propose a high-performance NMS processor that can quickly process a large number of candidate boxes without performing sorting of their scores. Also, it has low precision loss computing units and high parallel computing arrays. Combined with algorithm design, it effectively reduces the computational complexity and reduces the inference time of the end-to-end task of the NMS algorithm. Thus, our NMS processor's speed is comparable to SOTA architecture, and the average accuracy loss is only 0.4%.