Under the hood of transformer networks for trajectory forecasting

被引:18
作者
Franco, Luca [1 ]
Placidi, Leonardo [1 ]
Giuliari, Francesco [2 ]
Hasan, Irtiza [3 ]
Cristani, Marco [2 ]
Galasso, Fabio [1 ]
机构
[1] Sapienza Univ Rome, Rome, Italy
[2] Univ Verona, Verona, Italy
[3] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
关键词
Trajectory forecasting; Human behavior; Transformer networks; BERT; Multi -modal future prediction;
D O I
10.1016/j.patcog.2023.109372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer Networks have established themselves as the de-facto state-of-the-art for trajectory forecasting but there is currently no systematic study on their capability to model the motion patterns of people, without interactions with other individuals nor the social context. There is abundant literature on LSTMs, CNNs and GANs on this subject. However methods adopting Transformer techniques achieve great performances by complex models and a clear analysis of their adoption as plain sequence models is missing. This paper proposes the first in-depth study of Transformer Networks (TF) and the Bidirectional Transformers (BERT) for the forecasting of the individual motion of people, without bells and whistles. We conduct an exhaustive evaluation of the input/output representations, problem formulations and sequence modelling, including a novel analysis of their capability to predict multi-modal futures. Out of comparative evaluation on the ETH+UCY benchmark, both TF and BERT are top performers in predicting individual motions and remain within a narrow margin wrt more complex techniques, including both social interactions and scene contexts. Source code will be released for all conducted experiments. (c) 2023 Published by Elsevier Ltd.
引用
收藏
页数:10
相关论文
共 39 条
[1]   Social LSTM: Human Trajectory Prediction in Crowded Spaces [J].
Alahi, Alexandre ;
Goel, Kratarth ;
Ramanathan, Vignesh ;
Robicquet, Alexandre ;
Li Fei-Fei ;
Savarese, Silvio .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :961-971
[2]  
Belagiannis V., 2019, FORECASTING PEOPLE T
[3]   Driving behavior explanation with multi-level fusion [J].
Ben-Younes, Hedi ;
Zablocki, Eloi ;
Perez, Patrick ;
Cord, Matthieu .
PATTERN RECOGNITION, 2022, 123
[4]  
Cao Zhe, 2020, Long-term human motion prediction with scene context
[5]  
Chai Y., 2019, MULTIPLE PROBABILIST
[6]   Online multiple object tracking using joint detection and emb e dding network [J].
Chan, Sixian ;
Jia, Yangwei ;
Zhou, Xiaolong ;
Bai, Cong ;
Chen, Shengyong ;
Zhang, Xiaoqin .
PATTERN RECOGNITION, 2022, 130
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]   Simultaneous body part and motion identification for human-following robots [J].
Ding, Sihao ;
Zhai, Qiang ;
Li, Ying ;
Zhu, Junda ;
Zheng, Yuan F. ;
Xuan, Dong .
PATTERN RECOGNITION, 2016, 50 :118-130
[9]   Transformer Networks for Trajectory Forecasting [J].
Giuliari, Francesco ;
Hasan, Irtiza ;
Cristani, Marco ;
Galasso, Fabio .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :10335-10342
[10]   Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks [J].
Gupta, Agrim ;
Johnson, Justin ;
Li Fei-Fei ;
Savarese, Silvio ;
Alahi, Alexandre .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2255-2264