Canonical Shape Projection Is All You Need for 3D Few-Shot Class Incremental Learning

被引:0
作者
Cheraghian, Ali [1 ,2 ]
Hayder, Zeeshan [1 ,2 ]
Ramasinghe, Sameera [3 ]
Rahman, Shafin [4 ]
Jafaryahya, Javad [5 ]
Petersson, Lars [2 ]
Harandi, Mehrtash [6 ]
机构
[1] CSIRO, Data61, Sydney, NSW, Australia
[2] Australian Natl Univ, Canberra, ACT, Australia
[3] Amazon, Seattle, WA USA
[4] North South Univ, Dhaka, Bangladesh
[5] Univ Technol Sydney, Ultimo, Australia
[6] Monash Univ, Melbourne, Vic, Australia
来源
COMPUTER VISION - ECCV 2024, PT XLI | 2025年 / 15099卷
基金
澳大利亚研究理事会;
关键词
3D shape projection; Model reprogramming; Few-shot class incremental learning;
D O I
10.1007/978-3-031-72940-9_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, robust pre-trained foundation models have been successfully used in many downstream tasks. Here, we would like to use such powerful models to address the problem of few-shot class incremental learning (FSCIL) tasks on 3D point cloud objects. Our approach is to reprogram the well-known CLIP-based foundation model (trained on 2D images and text pairs) for this purpose. The CLIP model works by ingesting 2D images, so to leverage it in our context, we project the 3D object point cloud onto 2D image space to create proper depth maps. For this, prior works consider a fixed and non-trainable set of camera poses. In contrast, we propose to train the network to find a projection that best describes the object and is appropriate for extracting 2D image features from the CLIP vision encoder. Directly using the generated depth map is not suitable for the CLIP model, so we apply the model reprogramming paradigm to the depth map to augment the foreground and background to adapt it. This removes the need for modification or fine-tuning of the foundation model. In the setting we have investigated, we have limited access to data from novel classes, resulting in a problem with overfitting. Here, we address this problem via the use of a prompt engineering approach using multiple GPT-generated text descriptions. Our method, C3PR, successfully outperforms existing FSCIL methods on ModelNet, ShapeNet, ScanObjectNN, and CO3D datasets. The code is available at https://github.com/alichr/C3PR.
引用
收藏
页码:36 / 53
页数:18
相关论文
共 51 条
[1]  
Bansal N, 2018, ADV NEUR IN, V31
[2]   IL2M: Class Incremental Learning With Dual Memory [J].
Belouadah, Eden ;
Popescu, Adrian .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :583-592
[3]  
Belouadah E, 2020, IEEE WINT CONF APPL, P1255, DOI [10.1109/WACV45572.2020.9093562, 10.1109/wacv45572.2020.9093562]
[4]  
Brown TB, 2020, ADV NEUR IN, V33
[5]   End-to-End Incremental Learning [J].
Castro, Francisco M. ;
Marin-Jimenez, Manuel J. ;
Guil, Nicolas ;
Schmid, Cordelia ;
Alahari, Karteek .
COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :241-257
[6]  
Chen Kuilin, 2021, ICLR
[7]  
Chen P.Y., 2023, Model reprogramming: resource-efficient cross-domain machine learning
[8]   Synthesized Feature based Few-Shot Class-Incremental Learning on a Mixture of Subspaces [J].
Cheraghian, Ali ;
Rahman, Shafin ;
Ramasinghe, Sameera ;
Fang, Pengfei ;
Simon, Christian ;
Petersson, Lars ;
Harandi, Mehrtash .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :8641-8650
[9]   Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning [J].
Cheraghian, Ali ;
Rahman, Shafin ;
Fang, Pengfei ;
Roy, Soumava Kumar ;
Petersson, Lars ;
Harandi, Mehrtash .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :2534-2543
[10]   Few-Shot Class-Incremental Learning for 3D Point Cloud Objects [J].
Chowdhury, Townim ;
Cheraghian, Ali ;
Ramasinghe, Sameera ;
Ahmadi, Sahar ;
Saberi, Morteza ;
Rahman, Shafin .
COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 :204-220