Self-supervised learning of rotation-invariant 3D point set features using transformer and its self-distillation

被引：0

作者：

Furuya, Takahiko ^{[1
]}

Chen, Zhoujie ^{[1
,2
]}

Ohbuchi, Ryutarou ^{[1
]}

Kuang, Zhenzhong ^{[2
]}

机构：

[1] Univ Yamanashi, Dept Comp Sci & Engn, 4-3-11 Takeda, Kofu, Yamanashi 4008511, Japan

[2] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310000, Peoples R China

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2024年 / 244卷

基金：

日本学术振兴会;

关键词：

Deep learning; Self-supervised learning; 3D point set; Feature representation; Rotation invariance;

D O I：

10.1016/j.cviu.2024.104025

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Invariance against rotations of 3D objects is an important property in analyzing 3D point set data. Conventional 3D point set DNNs having rotation invariance typically obtain accurate 3D shape features via supervised learning by using labeled 3D point sets as training samples. However, due to the rapid increase in 3D point set data and the high cost of labeling, a framework to learn rotation -invariant 3D shape features from numerous unlabeled 3D point sets is required. This paper proposes a novel self -supervised learning framework for acquiring accurate and rotation -invariant 3D point set features at object -level. Our proposed lightweight DNN architecture decomposes an input 3D point set into multiple global -scale regions, called tokens, that preserve the spatial layout of partial shapes composing the 3D object. We employ a self -attention mechanism to refine the tokens and aggregate them into an expressive rotation -invariant feature per 3D point set. Our DNN is effectively trained by using pseudo -labels generated by a self -distillation framework. To facilitate the learning of accurate features, we propose to combine multi -crop and cut -mix data augmentation techniques to diversify 3D point sets for training. Through a comprehensive evaluation, we empirically demonstrate that, (1) existing rotation -invariant DNN architectures designed for supervised learning do not necessarily learn accurate 3D shape features under a self -supervised learning scenario, and (2) our proposed algorithm learns rotation -invariant 3D point set features that are more accurate than those learned by existing algorithms.

引用

页数：14

共 50 条

[21] Self-supervised Learning for Sketch-Based 3D Shape Retrieval
Chen, Zhixiang
Zhao, Haifeng
Zhang, Yan
Sun, Guozi
Wu, Tianjian
PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2022, 2022, 13534 : 318 - 329
[22] Self-supervised Secondary Landmark Detection via 3D Representation Learning
Bala, Praneet
Zimmermann, Jan
Park, Hyun Soo
Hayden, Benjamin Y.
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 1980 - 1994
[23] Single View 3D Point Cloud Reconstruction using Novel View Synthesis and Self-Supervised Depth Estimation
Johnston, Adrian
Carneiro, Gustavo
2019 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2019, : 208 - 215
[24] Self-supervised Secondary Landmark Detection via 3D Representation Learning
Praneet Bala
Jan Zimmermann
Hyun Soo Park
Benjamin Y. Hayden
International Journal of Computer Vision, 2023, 131 : 1980 - 1994
[25] 3D Point Cloud Adversarial Sample Classification Algorithm Based on Self-Supervised Learning and Information Gain
Sun, Ning
Jin, Boqiang
Guo, Jielong
Zheng, Jianzhang
Shao, Dongheng
Zhang, Jianfeng
IEEE ACCESS, 2023, 11 : 119544 - 119552
[26] QT-UNet: A Self-Supervised Self-Querying All-Transformer U-Net for 3D Segmentation
Haversen, Andreas Hammer
Bavirisetti, Durga Prasad
Kiss, Gabriel Hanssen
Lindseth, Frank
IEEE ACCESS, 2024, 12 : 62664 - 62676
[27] SEMI-SUPERVISED AND SELF-SUPERVISED COLLABORATIVE LEARNING FOR PROSTATE 3D MR IMAGE SEGMENTATION
Osman, Yousuf Babiker M.
Li, Cheng
Huang, Weijian
Elsayed, Nazik
Ying, Leslie
Zheng, Hairong
Wang, Shanshan
2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
[28] Motion Guided Attention Learning for Self-Supervised 3D Human Action Recognition
Yang, Yang
Liu, Guangjun
Gao, Xuehao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8623 - 8634
[29] Multi-View 3D Human Pose Estimation with Self-Supervised Learning
Chang, Inho
Park, Min-Gyu
Kim, Jaewoo
Yoon, Ju Hong
3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 255 - 257
[30] Learning on the Rings: Self-Supervised 3D Finger Motion Tracking UsingWearable Sensors
Zhou, Hao
Lu, Taiting
Liu, Yilin
Zhang, Shijia
Gowda, Mahanth
PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2022, 6 (02):

← 1 2 3 4 5 →