DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

被引：201

作者：

Lee, Namhoon ^{[1
]}

Choi, Wongun ^{[2
]}

Vernaza, Paul ^{[2
]}

Choy, Christopher B. ^{[3
]}

Torr, Philip H. S. ^{[1
]}

Chandraker, Manmohan ^{[2
,4
]}

机构：

[1] Univ Oxford, Oxford, England

[2] NEC Labs Amer, Irving, TX USA

[3] Stanford Univ, Stanford, CA 94305 USA

[4] Univ Calif San Diego, San Diego, CA 92103 USA

来源：

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年

基金：

英国工程与自然科学研究理事会;

关键词：

MODELS;

D O I：

10.1109/CVPR.2017.233

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce a Deep Stochastic IOC1 RNN Encoder-decoder framework, DESIRE, for the task of future predictions of multiple interacting agents in dynamic scenes. DESIRE effectively predicts future locations of objects in multiple scenes by 1) accounting for the multi-modal nature of the future prediction (i.e., given the same context, future may vary), 2) foreseeing the potential future outcomes and make a strategic prediction based on that, and 3) reasoning not only from the past motion history, but also from the scene context as well as the interactions among the agents. DESIRE achieves these in a single end-to-end trainable neural network model, while being computationally efficient. The model first obtains a diverse set of hypothetical future prediction samples employing a conditional variational auto-encoder, which are ranked and refined by the following RNN scoring-regression module. Samples are scored by accounting for accumulated future rewards, which enables better long-term strategic decisions similar to IOC frameworks. An RNN scene context fusion module jointly captures past motion histories, the semantic scene context and interactions among multiple agents. A feedback mechanism iterates over the ranking and refinement to further boost the prediction accuracy. We evaluate our model on two publicly available datasets: KITTI and Stanford Drone Dataset. Our experiments show that the proposed model significantly improves the prediction accuracy compared to other baseline methods.

引用

页码：2165 / 2174

页数：10

共 52 条

[31]

Karpathy A, 2015, PROC CVPR IEEE, P3128, DOI 10.1109/CVPR.2015.7298932

[32]

King DB, 2015, ACS SYM SER, V1214, P1

[33]

Kitani KM, 2012, LECT NOTES COMPUT SC, V7575, P201, DOI 10.1007/978-3-642-33765-9_15

[34]

Kooij JFP, 2014, LECT NOTES COMPUT SC, V8694, P618, DOI 10.1007/978-3-319-10599-4_40

[35]

Kretzschmar H, 2014, IEEE INT CONF ROBOT, P4015, DOI 10.1109/ICRA.2014.6907442

[36]

Lee N., 2016, 2016 IEEE WINTER C A, P1, DOI 10.1109/WACV.2016.7477732

[37]

Littman M. L., Markov games as a framework for multi-agent reinforcement learning"

[38]

Mainprice Jim, 2016, ARXIV160602111

[39] You'll Never Walk Alone: Modeling Social Behavior for Multi-target Tracking [J].

Pellegrini, S. ;

Ess, A. ;

Schindler, K. ;

van Gool, L. .

2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :261-268

[40]

Priestley M. B., 1981, Spectral Analysis and Time Series. Probability and Mathematical Statistics: A Series of Monographs and Texbooks

← 1 2 3 4 5 6 →