DySeT: A Dynamic Masked Self-distillation Approach for Robust Trajectory Prediction

被引：0

作者：

Pourkeshavarz, Mozhgan ^{[1
]}

Zhang, Junrui ^{[1
]}

Rasouli, Amir ^{[1
]}

机构：

[1] Huawei, Noahs Ark Lab, Montreal, PQ, Canada

来源：

COMPUTER VISION - ECCV 2024, PT III | 2025年 / 15061卷

关键词：

D O I：

10.1007/978-3-031-72646-0_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The lack of generalization capability of behavior prediction models for autonomous vehicles is a crucial concern for safe motion planning. One way to address this is via self-supervised pre-training through masked trajectory prediction. However, the existing models rely on uniform random sampling of tokens, which is sub-optimal because it implies that all components of driving scenes are equally informative. In this paper, to enable more robust representation learning, we introduce a dynamic masked self-distillation approach to identify and utilize informative aspects of the scenes, particularly those corresponding to complex driving behaviors, such as overtaking. Specifically, for targeted sampling, we propose a dynamic method that prioritizes tokens, such as trajectory or lane segments, based on their informativeness. The latter is determined via an auxiliary network that estimates token distributions. Through sampler optimization, more informative tokens are rewarded and selected as visible based on the policy gradient algorithm adopted from reinforcement learning. In addition, we propose a masked self-distillation approach to transfer knowledge from fully visible to masked scene representations. The distillation process not only enriches the semantic information within the visible token set but also progressively refines the sampling process. Further, we use an integrated training regime to enhance the model's ability to learn meaningful representations from informative tokens. Our extensive evaluation on two large-scale trajectory prediction datasets demonstrates the superior performance of the proposed method and its improved prediction robustness across different scenarios.

引用

页码：324 / 342

页数：19

共 84 条

[31] Hinton G, 2015, Arxiv, DOI arXiv:1503.02531
[32] MGMAE: Motion Guided Masking for Video Masked Autoencoding
Huang, Bingkun
Zhao, Zhiyu
Zhang, Guozhen
Qiao, Yu
Wang, Limin
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13447 - 13458
[33] Huang Z., 2023, ICCV
[34] Multi-modal Motion Prediction with Transformer-based Neural Network for Autonomous Driving
Huang, Zhiyu
Mo, Xiaoyu
Lv, Chen
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 2605 - 2611
[35] Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation
Ji, Mingi
Shin, Seungjae
Hwang, Seunghyun
Park, Gibeom
Moon, Il-Chul
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10659 - 10668
[36] What to Hide from Your Students: Attention-Guided Masked Image Modeling
Kakogeorgiou, Ioannis
Gidaris, Spyros
Psomas, Bill
Avrithis, Yannis
Bursuc, Andrei
Karantzalos, Konstantinos
Komodakis, Nikos
[J]. COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 300 - 318
[37] Karim R, 2023, Arxiv, DOI arXiv:2310.07438
[38] Khandelwal S, 2020, Arxiv, DOI arXiv:2008.10587
[39] Self-Knowledge Distillation with Progressive Refinement of Targets
Kim, Kyungyul
Ji, ByeongMoon
Yoon, Doyoung
Hwang, Sangheum
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6547 - 6556
[40] Crowds by example
Lerner, Alon
Chrysanthou, Yiorgos
Lischinski, Dani
[J]. COMPUTER GRAPHICS FORUM, 2007, 26 (03) : 655 - 664

← 1 2 3 4 5 6 7 8 9 →