Crossmodal Transformer Based Generative Framework for Pedestrian Trajectory Prediction

被引:20
作者
Su, Zhaoxin [1 ]
Huang, Gang [2 ]
Zhang, Sanyuan [1 ]
Hua, Wei [2 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[2] Zhejiang Lab, Hangzhou, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022) | 2022年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICRA46639.2022.9812226
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Providing guidance about collision avoidance, pedestrian trajectory prediction is an important task for autonomous driving. In this paper, to produce plausible trajectory predictions in the first-person view circumstance, we propose a crossmodal transformer based generative framework which could leverage sequences of cues from multiple modalities as well as pedestrian attributes. For the encoder, crossmodal transformers are exploited during the past stage to explore the cross-relation features of four modality-modality pairs, which are then fused with the help of a branch assigning operation and a modality attention module. For the decoder, we employ a b ' ezier curve interpolation based method to project encoder features into trajectory results. Our training process not only considers the pedestrian's intention of crossing road but also optimizes our model to achieve more accurate predictions at the terminal time steps. Experimental results demonstrate that our framework outperforms state-of-the-art methods on both JAAD and PIE datasets. Especially, compared with the best baseline, our method could achieve 15.1%/14.3% and 14.3%/22.2% improvement for deterministic/multimodal prediction in the metric of box center final displacement error on JAAD and PIE, respectively.
引用
收藏
页码:2337 / 2343
页数:7
相关论文
共 30 条
[1]   Social LSTM: Human Trajectory Prediction in Crowded Spaces [J].
Alahi, Alexandre ;
Goel, Kratarth ;
Ramanathan, Vignesh ;
Robicquet, Alexandre ;
Li Fei-Fei ;
Savarese, Silvio .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :961-971
[2]   Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty [J].
Bhattacharyya, Apratim ;
Fritz, Mario ;
Schiele, Bernt .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4194-4202
[3]   Implicit Latent Variable Model for Scene-Consistent Motion Forecasting [J].
Casas, Sergio ;
Gulino, Cole ;
Suo, Simon ;
Luo, Katie ;
Liao, Renjie ;
Urtasun, Raquel .
COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :624-641
[4]   TPNet: Trajectory Proposal Network for Motion Prediction [J].
Fang, Liangji ;
Jiang, Qinhong ;
Shi, Jianping ;
Zhou, Bolei .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6796-6805
[5]   Dynamic Fusion with Intra- and Inter-modality Attention Flow for Visual Question Answering [J].
Gao, Peng ;
Jiang, Zhengkai ;
You, Haoxuan ;
Lu, Pan ;
Hoi, Steven ;
Wang, Xiaogang ;
Li, Hongsheng .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :6632-6641
[6]   Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks [J].
Gupta, Agrim ;
Johnson, Justin ;
Li Fei-Fei ;
Savarese, Silvio ;
Alahi, Alexandre .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2255-2264
[7]  
Houenou A, 2013, IEEE INT C INT ROBOT, P4363, DOI 10.1109/IROS.2013.6696982
[8]   The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs [J].
Ivanovic, Boris ;
Pavone, Marco .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2375-2384
[9]   Modality Shifting Attention Network for Multi-modal Video Question Answering [J].
Kim, Junyeong ;
Ma, Minuk ;
Pham, Trung ;
Kim, Kyungsu ;
Yoo, Chang D. .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10103-10112
[10]   Focal Visual-Text Attention for Visual Question Answering [J].
Liang, Junwei ;
Jiang, Lu ;
Cao, Liangliang ;
Li, Li-Jia ;
Hauptmann, Alexander .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6135-6143