T-MAE : Temporal Masked Autoencoders for Point Cloud Representation Learning

被引:0
|
作者
Wei, Weijie [1 ]
Nejadasl, Fatemeh Karimi [1 ]
Gevers, Theo [1 ]
Oswald, Martin R. [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
来源
关键词
Self-supervised learning; LiDAR point cloud; 3D detection; NETWORKS;
D O I
10.1007/978-3-031-73247-8_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The scarcity of annotated data in LiDAR point cloud understanding hinders effective representation learning. Consequently, scholars have been actively investigating efficacious self-supervised pre-training paradigms. Nevertheless, temporal information, which is inherent in the LiDAR point cloud sequence, is consistently disregarded. To better utilize this property, we propose an effective pre-training strategy, namely Temporal Masked Auto-Encoders (T-MAE), which takes as input temporally adjacent frames and learns temporal dependency. A SiamWCA backbone, containing a Siamese encoder and a windowed cross-attention (WCA) module, is established for the two-frame input. Considering that the movement of an ego-vehicle alters the view of the same instance, temporal modeling also serves as a robust and natural data augmentation, enhancing the comprehension of target objects. SiamWCA is a powerful architecture but heavily relies on annotated data. Our T-MAE pre-training strategy alleviates its demand for annotated data. Comprehensive experiments demonstrate that T-MAE achieves the best performance on both Waymo and ONCE datasets among competitive self-supervised approaches.
引用
收藏
页码:178 / 195
页数:18
相关论文
共 50 条
  • [31] Cosine Mixup: A Mixup Strategy for Point Cloud Contrastive Representation Learning
    Li, Guangping
    Gao, Xiang
    Liu, Chenghui
    Liang, Dingkai
    Ling, Bingo Wing-Kuen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (02) : 967 - 971
  • [32] Unsupervised Point Cloud Representation Learning With Deep Neural Networks: A Survey
    Xiao, Aoran
    Huang, Jiaxing
    Guan, Dayan
    Zhang, Xiaoqin
    Lu, Shijian
    Shao, Ling
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) : 11321 - 11339
  • [33] Joint representation learning for text and 3D point cloud
    Huang, Rui
    Pan, Xuran
    Zheng, Henry
    Jiang, Haojun
    Xie, Zhifeng
    Wu, Cheng
    Song, Shiji
    Huang, Gao
    PATTERN RECOGNITION, 2024, 147
  • [34] Geometric Invariant Representation Learning for 3D Point Cloud
    Li, Zongmin
    Zhang, Yupeng
    Bai, Yun
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1480 - 1485
  • [35] Representation Learning for Object Detection from Unlabeled Point Cloud Sequences
    Huang, Xiangru
    Wang, Yue
    Guizilini, Vitor
    Ambrus, Rares
    Gaidon, Adrien
    Solomon, Justin
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1277 - 1288
  • [36] DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing
    Liu, Yongcheng
    Fan, Bin
    Meng, Gaofeng
    Lu, Jiwen
    Xiang, Shiming
    Pan, Chunhong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5238 - 5247
  • [37] Learning Temporal Variations for 4D Point Cloud Segmentation
    Shi, Hanyu
    Wei, Jiacheng
    Wang, Hao
    Liu, Fayao
    Lin, Guosheng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 5603 - 5617
  • [38] Deep Hierarchical Representation of Point Cloud Videos via Spatio-Temporal Decomposition
    Fan, Hehe
    Yu, Xin
    Yang, Yi
    Kankanhalli, Mohan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9918 - 9930
  • [39] Clustering based Point Cloud Representation Learning for 3D Analysis
    Feng, Tuo
    Wang, Wenguan
    Wang, Xiaohan
    Yang, Yi
    Zheng, Qinghua
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8249 - 8260
  • [40] Implicit Autoencoder for Point-Cloud Self-Supervised Representation Learning
    Yan, Siming
    Yang, Zhenpei
    Li, Haoxiang
    Song, Chen
    Guan, Li
    Kang, Hao
    Hua, Gang
    Huang, Qixing
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14484 - 14496