STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction

被引：29

作者：

Zhang, Zhishuai ^{[1
,2
]}

Gao, Jiyang ^{[1
]}

Mao, Junhua ^{[1
]}

Liu, Yukai ^{[1
]}

Anguelov, Dragomir ^{[1
]}

Li, Congcong ^{[1
]}

机构：

[1] Waymo LLC, Mountain View, CA 94043 USA

[2] Johns Hopkins Univ, Baltimore, MD 21218 USA

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年

关键词：

D O I：

10.1109/CVPR42600.2020.01136

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Detecting pedestrians and predicting future trajectories for them are critical tasks for numerous applications, such as autonomous driving. Previous methods either treat the detection and prediction as separate tasks or simply add a trajectory regression head on top of a detector. In this work, we present a novel end-to-end two-stage network: Spatio-Temporal-Interactive Network (STINet). In addition to 3D geometry modeling of pedestrians, we model the temporal information for each of the pedestrians. To do so, our method predicts both current and past locations in the first stage, so that each pedestrian can be linked across frames and the comprehensive spatio-temporal information can be captured in the second stage. Also, we model the interaction among objects with an interaction graph, to gather the information among the neighboring objects. Comprehensive experiments on the Lyft Dataset and the recently released large-scale Waymo Open Dataset for both object detection and future trajectory prediction validate the effectiveness of the proposed method. For the Waymo Open Dataset, we achieve a bird-eyes-view (BEV) detection AP of 80.73 and trajectory prediction average displacement error (ADE) of 33.67cm for pedestrians, which establish the state-of-the-art for both tasks.

引用

页码：11343 / 11352

页数：10

共 29 条

[1] Social LSTM: Human Trajectory Prediction in Crowded Spaces
Alahi, Alexandre
Goel, Kratarth
Ramanathan, Vignesh
Robicquet, Alexandre
Li Fei-Fei
Savarese, Silvio
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 961 - 971
[2] [Anonymous], 2019, WAYMO OPEN DATASET A
[3] Battaglia PW, 2018, ARXIV
[4] Casas S., 2018, C ROB LEARN, P947
[5] Argoverse: 3D Tracking and Forecasting with Rich Maps
Chang, Ming-Fang
Lambert, John
Sangkloy, Patsorn
Singh, Jagjeet
Bak, Slawomir
Hartnett, Andrew
Wang, De
Carr, Peter
Lucey, Simon
Ramanan, Deva
Hays, James
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8740 - 8749
[6] Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks
Gupta, Agrim
Johnson, Justin
Li Fei-Fei
Savarese, Silvio
Alahi, Alexandre
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2255 - 2264
[7] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]
[8] Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions
Hong, Joey
Sapp, Benjamin
Philbin, James
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8446 - 8454
[9] Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos
Hou, Rui
Chen, Chen
Shah, Mubarak
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5823 - 5832
[10] Action Tubelet Detector for Spatio-Temporal Action Localization
Kalogeiton, Vicky
Weinzaepfel, Philippe
Ferrari, Vittorio
Schmid, Cordelia
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4415 - 4423

← 1 2 3 →