DySeT: A Dynamic Masked Self-distillation Approach for Robust Trajectory Prediction

被引:0
作者
Pourkeshavarz, Mozhgan [1 ]
Zhang, Junrui [1 ]
Rasouli, Amir [1 ]
机构
[1] Huawei, Noahs Ark Lab, Montreal, PQ, Canada
来源
COMPUTER VISION - ECCV 2024, PT III | 2025年 / 15061卷
关键词
D O I
10.1007/978-3-031-72646-0_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The lack of generalization capability of behavior prediction models for autonomous vehicles is a crucial concern for safe motion planning. One way to address this is via self-supervised pre-training through masked trajectory prediction. However, the existing models rely on uniform random sampling of tokens, which is sub-optimal because it implies that all components of driving scenes are equally informative. In this paper, to enable more robust representation learning, we introduce a dynamic masked self-distillation approach to identify and utilize informative aspects of the scenes, particularly those corresponding to complex driving behaviors, such as overtaking. Specifically, for targeted sampling, we propose a dynamic method that prioritizes tokens, such as trajectory or lane segments, based on their informativeness. The latter is determined via an auxiliary network that estimates token distributions. Through sampler optimization, more informative tokens are rewarded and selected as visible based on the policy gradient algorithm adopted from reinforcement learning. In addition, we propose a masked self-distillation approach to transfer knowledge from fully visible to masked scene representations. The distillation process not only enriches the semantic information within the visible token set but also progressively refines the sampling process. Further, we use an integrated training regime to enhance the model's ability to learn meaningful representations from informative tokens. Our extensive evaluation on two large-scale trajectory prediction datasets demonstrates the superior performance of the proposed method and its improved prediction robustness across different scenarios.
引用
收藏
页码:324 / 342
页数:19
相关论文
共 84 条
  • [31] Hinton G, 2015, Arxiv, DOI arXiv:1503.02531
  • [32] MGMAE: Motion Guided Masking for Video Masked Autoencoding
    Huang, Bingkun
    Zhao, Zhiyu
    Zhang, Guozhen
    Qiao, Yu
    Wang, Limin
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13447 - 13458
  • [33] Huang Z., 2023, ICCV
  • [34] Multi-modal Motion Prediction with Transformer-based Neural Network for Autonomous Driving
    Huang, Zhiyu
    Mo, Xiaoyu
    Lv, Chen
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 2605 - 2611
  • [35] Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation
    Ji, Mingi
    Shin, Seungjae
    Hwang, Seunghyun
    Park, Gibeom
    Moon, Il-Chul
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10659 - 10668
  • [36] What to Hide from Your Students: Attention-Guided Masked Image Modeling
    Kakogeorgiou, Ioannis
    Gidaris, Spyros
    Psomas, Bill
    Avrithis, Yannis
    Bursuc, Andrei
    Karantzalos, Konstantinos
    Komodakis, Nikos
    [J]. COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 300 - 318
  • [37] Karim R, 2023, Arxiv, DOI arXiv:2310.07438
  • [38] Khandelwal S, 2020, Arxiv, DOI arXiv:2008.10587
  • [39] Self-Knowledge Distillation with Progressive Refinement of Targets
    Kim, Kyungyul
    Ji, ByeongMoon
    Yoon, Doyoung
    Hwang, Sangheum
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6547 - 6556
  • [40] Crowds by example
    Lerner, Alon
    Chrysanthou, Yiorgos
    Lischinski, Dani
    [J]. COMPUTER GRAPHICS FORUM, 2007, 26 (03) : 655 - 664