PIT: Progressive Interaction Transformer for Pedestrian Crossing Intention Prediction

被引:37
作者
Zhou, Yuchen [1 ]
Tan, Guang [1 ]
Zhong, Rui [1 ]
Li, Yaokun [1 ]
Gou, Chao [1 ]
机构
[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Shenzhen 518107, Peoples R China
基金
中国国家自然科学基金;
关键词
Pedestrians; Transformers; Vehicle dynamics; Vehicles; Autonomous vehicles; Predictive models; Modeling; Pedestrian intention prediction; transformer; traffic scene understanding; intelligent vehicle; neuroscience;
D O I
10.1109/TITS.2023.3309309
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
For autonomous driving, one of the major challenges is to predict pedestrian crossing intention in ego-view. Pedestrian intention depends not only on their intrinsic goals but also on the stimulation of surrounding traffic elements. Considering the influence of other traffic elements on pedestrian intention, recent work introduced more traffic element information into the model to successfully improve performance. However, it is still difficult to effectively capture and fully exploit the potential dynamic spatio-temporal interactions among the target pedestrian and its surrounding traffic elements for accurate reasoning. In this work, inspired by neuroscience that human drivers tend to make continuous sensory-motor driving decisions by progressive visual stimulation, we propose a model termed Progressive Interaction Transformer (PIT) for pedestrian crossing intention prediction. Local pedestrian, global environment, and ego-vehicle motion are considered simultaneously in the proposed PIT. In particular, the temporal fusion block and self-attention mechanism are introduced to jointly and progressively model the dynamic spatio-temporal interactions among the three parties, allowing it to capture richer information and make prediction in a similar way to human drivers. Experimental results demonstrate that PIT achieves higher performance compared with other state-of-the-arts and preserves real-time inference.
引用
收藏
页码:14213 / 14225
页数:13
相关论文
共 60 条
[1]   ViViT: A Video Vision Transformer [J].
Arnab, Anurag ;
Dehghani, Mostafa ;
Heigold, Georg ;
Sun, Chen ;
Lucic, Mario ;
Schmid, Cordelia .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826
[2]   Deep Virtual-to-Real Distillation for Pedestrian Crossing Prediction [J].
Bai, Jie ;
Fang, Xin ;
Fang, Jianwu ;
Xue, Jianru ;
Yuan, Changwei .
2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, :1586-1592
[3]   Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty [J].
Bhattacharyya, Apratim ;
Fritz, Mario ;
Schiele, Bernt .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4194-4202
[4]   Pedestrian Street-Cross Action Recognition in Monocular Far Infrared Sequences [J].
Brehar, Raluca Didona ;
Muresan, Mircea Paul ;
Marita, Tiberiu ;
Vancea, Cristian-Cosmin ;
Negru, Mihai ;
Nedevschi, Sergiu .
IEEE ACCESS, 2021, 9 :74302-74324
[5]   Pedestrian Graph plus : A Fast Pedestrian Crossing Prediction Model Based on Graph Convolutional Networks [J].
Cadena, Pablo Rodrigo Gantier ;
Qian, Yeqiang ;
Wang, Chunxiang ;
Yang, Ming .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (11) :21050-21061
[6]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[7]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[8]  
Chaabane M, 2020, IEEE WINT CONF APPL, P2286, DOI [10.1109/WACV45572.2020.9093426, 10.1109/wacv45572.2020.9093426]
[9]   Loss of neuronal network resilience precedes seizures and determines the ictogenic nature of interictal synaptic perturbations [J].
Chang, Wei-Chih ;
Kudlacek, Jan ;
Hlinka, Jaroslav ;
Chvojka, Jan ;
Hadrava, Michal ;
Kumpost, Vojtech ;
Powell, Andrew D. ;
Janca, Radek ;
Maturana, Matias I. ;
Karoly, Philippa J. ;
Freestone, Dean R. ;
Cook, Mark J. ;
Palus, Milan ;
Otahal, Jakub ;
Jefferys, John G. R. ;
Jiruska, Premysl .
NATURE NEUROSCIENCE, 2018, 21 (12) :1742-+
[10]   Visual Reasoning using Graph Convolutional Networks for Predicting Pedestrian Crossing Intention [J].
Chen, Tina ;
Tian, Renran ;
Ding, Zhengming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :3096-3102