Future pedestrian location prediction in first-person videos for autonomous vehicles and social robots

被引：6

作者：

Chen, Kai ^{[1
]}

Zhu, Haihua ^{[1
]}

Tang, Dunbing ^{[1
]}

Zheng, Kun ^{[2
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut NUAA, Coll Mech & Elect Engn, Nanjing, Peoples R China

[2] Nanjing Inst Technol Nanjing, Sch Automot & Rail Transit, Nanjing, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2023年 / 134卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Social intention; Human-vehicle interactions; First-person videos; Image depth; Social spatial dependencies; Transformer;

D O I：

10.1016/j.imavis.2023.104671

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Future pedestrian trajectory prediction in first-person videos offers great prospects to help autonomous vehicles and social robots to enable better human-vehicle interactions. Given an egocentric video stream, we aim to predict the location and depth (distance between the observed person and the camera) of his/her neighbors in future frames. To locate their future trajectories, we mainly consider three main factors: a) It is necessary to restore the spatial distribution of pedestrians in 2D image to 3D space, i.e., to extract the distance between the pedestrian and the camera which is often neglected. b) It is critical to utilize neighbors' poses to recognize their intentions. c) It is important to learn human-vehicle interactions from the pedestrian's historical trajecto-ries. We propose to incorporate these three factors into a multi-channel tensor to represent the main features in real-life 3D space. We then put this tensor into an innovative end-to-end fully convolutional network based on transformer architecture. Experimental results reveal our method outperforms other state-of-the-art methods on public benchmarks MOT15, MOT16 and MOT17. The proposed method will be useful to understand human -vehicle interaction and helpful for pedestrian collision avoidance.(c) 2023 Elsevier B.V. All rights reserved.

引用

页数：11

共 33 条

[31]

Vaswani A, 2017, ADV NEUR IN, V30

[32]

Vemula A, 2018, IEEE INT CONF ROBOT, P4601

[33] Future Person Localization in First-Person Videos [J].

Yagi, Takuma ;

Mangalam, Karttikeya ;

Yonetani, Ryo ;

Sato, Yoichi .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7593-7602

← 1 2 3 4 →