STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction

被引:29
作者
Zhang, Zhishuai [1 ,2 ]
Gao, Jiyang [1 ]
Mao, Junhua [1 ]
Liu, Yukai [1 ]
Anguelov, Dragomir [1 ]
Li, Congcong [1 ]
机构
[1] Waymo LLC, Mountain View, CA 94043 USA
[2] Johns Hopkins Univ, Baltimore, MD 21218 USA
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.01136
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting pedestrians and predicting future trajectories for them are critical tasks for numerous applications, such as autonomous driving. Previous methods either treat the detection and prediction as separate tasks or simply add a trajectory regression head on top of a detector. In this work, we present a novel end-to-end two-stage network: Spatio-Temporal-Interactive Network (STINet). In addition to 3D geometry modeling of pedestrians, we model the temporal information for each of the pedestrians. To do so, our method predicts both current and past locations in the first stage, so that each pedestrian can be linked across frames and the comprehensive spatio-temporal information can be captured in the second stage. Also, we model the interaction among objects with an interaction graph, to gather the information among the neighboring objects. Comprehensive experiments on the Lyft Dataset and the recently released large-scale Waymo Open Dataset for both object detection and future trajectory prediction validate the effectiveness of the proposed method. For the Waymo Open Dataset, we achieve a bird-eyes-view (BEV) detection AP of 80.73 and trajectory prediction average displacement error (ADE) of 33.67cm for pedestrians, which establish the state-of-the-art for both tasks.
引用
收藏
页码:11343 / 11352
页数:10
相关论文
共 29 条
  • [1] Social LSTM: Human Trajectory Prediction in Crowded Spaces
    Alahi, Alexandre
    Goel, Kratarth
    Ramanathan, Vignesh
    Robicquet, Alexandre
    Li Fei-Fei
    Savarese, Silvio
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 961 - 971
  • [2] [Anonymous], 2019, WAYMO OPEN DATASET A
  • [3] Battaglia PW, 2018, ARXIV
  • [4] Casas S., 2018, C ROB LEARN, P947
  • [5] Argoverse: 3D Tracking and Forecasting with Rich Maps
    Chang, Ming-Fang
    Lambert, John
    Sangkloy, Patsorn
    Singh, Jagjeet
    Bak, Slawomir
    Hartnett, Andrew
    Wang, De
    Carr, Peter
    Lucey, Simon
    Ramanan, Deva
    Hays, James
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8740 - 8749
  • [6] Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks
    Gupta, Agrim
    Johnson, Justin
    Li Fei-Fei
    Savarese, Silvio
    Alahi, Alexandre
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2255 - 2264
  • [7] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]
  • [8] Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions
    Hong, Joey
    Sapp, Benjamin
    Philbin, James
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8446 - 8454
  • [9] Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos
    Hou, Rui
    Chen, Chen
    Shah, Mubarak
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5823 - 5832
  • [10] Action Tubelet Detector for Spatio-Temporal Action Localization
    Kalogeiton, Vicky
    Weinzaepfel, Philippe
    Ferrari, Vittorio
    Schmid, Cordelia
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4415 - 4423