Robust Pedestrian Crossing Intention Prediction via Uncertainty-Guided Transformer Ensemble Network for Autonomous Driving

被引:0
作者
Chen, Xiaobo [1 ]
Zhang, Shilin [1 ]
Xu, Wei [1 ]
Cheng, Dapeng [1 ]
Yang, Lei [2 ]
机构
[1] Shandong Technol & Business Univ, Sch Comp Sci & Technol, Yantai 264005, Peoples R China
[2] Alibaba Grp, Hangzhou 310099, Peoples R China
基金
中国国家自然科学基金;
关键词
Pedestrians; Transformers; Predictive models; Feature extraction; Correlation; Skeleton; Autonomous vehicles; Data models; Data mining; Accuracy; Cross-modal transformer; crossing intention prediction; ensemble learning; feature fusion;
D O I
10.1109/TIM.2025.3575998
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Predicting pedestrian crossing behavior is becoming increasingly important for autonomous driving vehicles, especially in the scene of urban transport. Most of the previous methods concentrate on feature-level fusion that integrates various types of input data, without considering the prediction of each individual input. In order to overcome this defect, this article proposes an uncertainty-guided transformer ensemble network (UTENet) that explores the merits of both feature-level and decision-level in a unified framework. The proposed model takes only the pedestrian bounding box (bbox) and ego-vehicle velocity as input. First, for each input, we apply the self-attention mechanism to model the intramodal correlation and aggregate the correlated features at different moments. Then, we put forward a cross-modal attention-based fusion module to capture the intramodal relationships between two inputs so that a more comprehensive representation related to crossing intention can be generated. Finally, we design an uncertainty-based ensemble strategy for decision-level fusion, thus remedying the drawback of individual prediction and enhancing the robustness. The results of the experiment on the real-world benchmark dataset verify that our model can predict pedestrian crossing behavior using less modal information while achieving performance that is comparable to or even better than the methods relying on more inputs. Extensive ablation studies are also provided to verify the effectiveness of our model components.
引用
收藏
页数:13
相关论文
共 53 条
[1]   Is attention to bounding boxes all you need for pedestrian action prediction? [J].
Achaji, Lina ;
Moreau, Julien ;
Fouqueray, Thibault ;
Aioun, Francois ;
Charpillet, Francois .
2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, :895-902
[2]   Multi-scale pedestrian intent prediction using 3D joint information as spatio-temporal representation [J].
Ahmed, Sarfraz ;
Al Bazi, Ammar ;
Saha, Chitta ;
Rajbhandari, Sujan ;
Huda, M. Nazmul .
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 225
[3]  
Ai Y., 2024, P EUR C COMP VIS ECC, P221
[4]  
Azarmi M, 2024, Arxiv, DOI [arXiv:2402.12810, arXiv:2402.12810, 10.1109/TITS.2025.3570794]
[6]   Pedestrian Graph plus : A Fast Pedestrian Crossing Prediction Model Based on Graph Convolutional Networks [J].
Cadena, Pablo Rodrigo Gantier ;
Qian, Yeqiang ;
Wang, Chunxiang ;
Yang, Ming .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (11) :21050-21061
[7]  
Cadena PRG, 2019, IEEE INT C INTELL TR, P2000, DOI [10.1109/itsc.2019.8917118, 10.1109/ITSC.2019.8917118]
[8]  
Chen TA, 2022, Arxiv, DOI arXiv:2112.02604
[9]   Visual Reasoning using Graph Convolutional Networks for Predicting Pedestrian Crossing Intention [J].
Chen, Tina ;
Tian, Renran ;
Ding, Zhengming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :3096-3102
[10]   Traffic Agents Trajectory Prediction Based on Enhanced Bidirectional Recurrent Network and Adaptive Social Interaction Model [J].
Chen, Xiaobo ;
Liang, Yuwen ;
Hu, Chuan ;
Wang, Hai ;
Ye, Qiaolin .
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 :12182-12196