PARTS BASED ATTENTION FOR HIGHLY OCCLUDED PEDESTRIAN DETECTION WITH TRANSFORMERS

被引：0

作者：

Shastry, K. N. Ajay ^{[1
]}

Chaudhari, Jayesh ^{[1
]}

Thapar, Daksh ^{[2
]}

Nigam, Aditya ^{[2
]}

Arora, Chetan ^{[1
]}

机构：

[1] Indian Inst Technol, Delhi, India

[2] Indian Inst Technol, Mandi, Himachal Prades, India

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

D O I：

10.1109/ICIP49359.2023.10222651

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the significant progress made in pedestrian detection in last decade, detecting pedestrians under heavy occlusion still remains a challenging problem. In state of the art (SOTA), convolutional neural network (CNN) based models, the reason is attributed to non-maximal-suppression (NMS), which often erroneously deletes true positives when one pedestrian is occluding other. SOTA transformer based models do not have such NMS step, yet fail to detect highly occluded pedestrians. In this paper, we study the reasons for such failures. We observe that such models first predict key-points, and then compute the attention at the specific key-points. Our analysis reveals that the key-points do not have any preference towards semantically important body parts. Under heavy occlusion, such key-points end up attending to non-discriminative regions or background, leading to false negatives. We take inspiration from the conventional wisdom of detecting objects using their parts, and bias the attention of proposed transformer architecture towards semantically important, and highly discriminative human body parts. The intervention leads to SOTA results on benchmark Citypersons and Caltech datasets, achieving 30.75%, and 32.96% miss-rate (lower is better) respectively, against 32.6%, and 38.2% by the current SOTA. Code is available at https://ajayshastry08.github.io/pa_dino

引用

页码：3085 / 3089

页数：5