MD-Mamba: Feature extractor on 3D representation with multi-view depth

被引：1

作者：

Li, Qihui ^{[1
]}

Li, Zongtan ^{[2
]}

Tian, Lianfang ^{[1
,3
,4
]}

Du, Qiliang ^{[1
,3
,4
]}

Lu, Guoyu ^{[2
]}

机构：

[1] South China Univ Technol, Sch Automat Sci & Engn, Guangzhou, Peoples R China

[2] Univ Georgia, Sch Elect & Comp Engn, Athens, GA USA

[3] Guangdong Engn Res Ctr Cloud Edge End Collaborat T, Guangzhou, Peoples R China

[4] Minist Nat Resources, Key Lab Marine Environm Survey Technol & Applicat, Guangzhou, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2025年 / 154卷

基金：

国家重点研发计划;

关键词：

Contrastive learning; Point cloud segmentation; Multi-modal learning; CLOUD; CLASSIFICATION;

D O I：

10.1016/j.imavis.2024.105396

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

3D sensors provide rich depth information and are widely used across various fields, making 3D vision a hot topic of research. Point cloud data, as a crucial type of 3D data, offers precise three-dimensional coordinate information and is extensively utilized innumerous domains, especially in robotics. However, the unordered and unstructured nature of point cloud data poses a significant challenge for feature extraction. Traditional methods have relied on designing complex local feature extractors to achieve feature extraction, but these approaches have reached a performance bottleneck. To address these challenges, this paper introduces MD-Mamba, a novel network that enhances point cloud feature extraction by integrating multi-view depth maps. Our approach leverages multi-modal learning, treating the multi-view depth maps as an additional global feature modality. By fusing these with locally extracted point cloud features, we achieve richer and more distinctive representations. We utilize an innovative feature extraction strategy, performing real projections of point clouds and treating multi-view projections as video streams. This method captures dynamic features across viewpoints using a specially designed Mamba network. Additionally, the incorporation of the Siamese Cluster module optimizes feature spacing, improving class differentiation. Extensive evaluations on ModelNet40, ShapeNetPart, and ScanObjectNN datasets validate the effectiveness of MD-Mamba, setting anew benchmark for multi-modal feature extraction in point cloud analysis.

引用

页数：12

共 97 条

[11] Decoupling Zero-Shot Semantic Segmentation [J].

Ding, Jian ;

Xue, Nan ;

Xia, Gui-Song ;

Dai, Dengxin .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11573-11582

[12]

Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]

[13]

Fu Rao, 2022, ADV NEUR IN

[14] LFT-Net: Local Feature Transformer Network for Point Clouds Analysis [J].

Gao, Yongbin ;

Liu, Xuebing ;

Li, Jun ;

Fang, Zhijun ;

Jiang, Xiaoyan ;

Huq, Kazi Mohammed Saidul .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (02) :2158-2168

[15]

Goyal A, 2021, PR MACH LEARN RES, V139

[16]

Gu A., 2021, ICLR

[17]

Gu A, 2024, Arxiv, DOI arXiv:2312.00752

[18]

Gu AL, 2022, Arxiv, DOI arXiv:2206.11893

[19] PCT: Point cloud transformer [J].

Guo, Meng-Hao ;

Cai, Jun-Xiong ;

Liu, Zheng-Ning ;

Mu, Tai-Jiang ;

Martin, Ralph R. ;

Hu, Shi-Min .

COMPUTATIONAL VISUAL MEDIA, 2021, 7 (02) :187-199

[20] UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection [J].

Guo, Ruohao ;

Ying, Xianghua ;

Qi, Yanyu ;

Qu, Liao .

IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :7622-7635

← 1 2 3 4 5 6 7 8 9 10 →