Future pedestrian location prediction in first-person videos for autonomous vehicles and social robots

被引：6

作者：

Chen, Kai ^{[1
]}

Zhu, Haihua ^{[1
]}

Tang, Dunbing ^{[1
]}

Zheng, Kun ^{[2
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut NUAA, Coll Mech & Elect Engn, Nanjing, Peoples R China

[2] Nanjing Inst Technol Nanjing, Sch Automot & Rail Transit, Nanjing, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2023年 / 134卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Social intention; Human-vehicle interactions; First-person videos; Image depth; Social spatial dependencies; Transformer;

D O I：

10.1016/j.imavis.2023.104671

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Future pedestrian trajectory prediction in first-person videos offers great prospects to help autonomous vehicles and social robots to enable better human-vehicle interactions. Given an egocentric video stream, we aim to predict the location and depth (distance between the observed person and the camera) of his/her neighbors in future frames. To locate their future trajectories, we mainly consider three main factors: a) It is necessary to restore the spatial distribution of pedestrians in 2D image to 3D space, i.e., to extract the distance between the pedestrian and the camera which is often neglected. b) It is critical to utilize neighbors' poses to recognize their intentions. c) It is important to learn human-vehicle interactions from the pedestrian's historical trajecto-ries. We propose to incorporate these three factors into a multi-channel tensor to represent the main features in real-life 3D space. We then put this tensor into an innovative end-to-end fully convolutional network based on transformer architecture. Experimental results reveal our method outperforms other state-of-the-art methods on public benchmarks MOT15, MOT16 and MOT17. The proposed method will be useful to understand human -vehicle interaction and helpful for pedestrian collision avoidance.(c) 2023 Elsevier B.V. All rights reserved.

引用

页数：11

共 33 条

[1] Social LSTM: Human Trajectory Prediction in Crowded Spaces [J].

Alahi, Alexandre ;

Goel, Kratarth ;

Ramanathan, Vignesh ;

Robicquet, Alexandre ;

Li Fei-Fei ;

Savarese, Silvio .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :961-971

[2] Real-time motion trajectory-based indexing and retrieval of video sequences [J].

Bashir, Faisal I. ;

Khokhar, Ashfaq A. ;

Schonfeld, Dan .

IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (01) :58-65

[3] Looking to Relations for Future Trajectory Forecast [J].

Choi, Chiho ;

Dariush, Behzad .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :921-930

[4] RMPE: Regional Multi-Person Pose Estimation [J].

Fang, Hao-Shu ;

Xie, Shuqin ;

Tai, Yu-Wing ;

Lu, Cewu .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2353-2362

[5] Certifiable relative pose estimation [J].

Garcia-Salguero, Mercedes ;

Briales, Jesus ;

Gonzalez-Jimenez, Javier .

IMAGE AND VISION COMPUTING, 2021, 109 (109)

[6]

Gehring J, 2017, PR MACH LEARN RES, V70

[7] Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks [J].

Gupta, Agrim ;

Johnson, Justin ;

Li Fei-Fei ;

Savarese, Silvio ;

Alahi, Alexandre .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2255-2264

[8] TEXTURAL FEATURES FOR IMAGE CLASSIFICATION [J].

HARALICK, RM ;

SHANMUGAM, K ;

DINSTEIN, I .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1973, SMC3 (06) :610-621

[9] Simulating dynamical features of escape panic [J].

Helbing, D ;

Farkas, I ;

Vicsek, T .

NATURE, 2000, 407 (6803) :487-490

[10]

Jaipuria N, 2018, Arxiv, DOI arXiv:1806.09444

← 1 2 3 4 →