Multi-Modal 3D Shape Clustering with Dual Contrastive Learning

被引:5
作者
Lin, Guoting [1 ]
Zheng, Zexun [1 ]
Chen, Lin [1 ]
Qin, Tianyi [1 ]
Song, Jiahui [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 15期
基金
中国博士后科学基金;
关键词
multi-modal clustering; unsupervised learning; 3D shapes; contrastive learning;
D O I
10.3390/app12157384
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
3D shape clustering is developing into an important research subject with the wide applications of 3D shapes in computer vision and multimedia fields. Since 3D shapes generally take on various modalities, how to comprehensively exploit the multi-modal properties to boost clustering performance has become a key issue for the 3D shape clustering task. Taking into account the advantages of multiple views and point clouds, this paper proposes the first multi-modal 3D shape clustering method, named the dual contrastive learning network (DCL-Net), to discover the clustering partitions of unlabeled 3D shapes. First, by simultaneously performing cross-view contrastive learning within multi-view modality and cross-modal contrastive learning between the point cloud and multi-view modalities in the representation space, a representation-level dual contrastive learning module is developed, which aims to capture discriminative 3D shape features for clustering. Meanwhile, an assignment-level dual contrastive learning module is designed by further ensuring the consistency of clustering assignments within the multi-view modality, as well as between the point cloud and multi-view modalities, thus obtaining more compact clustering partitions. Experiments on two commonly used 3D shape benchmarks demonstrate the effectiveness of the proposed DCL-Net.
引用
收藏
页数:13
相关论文
共 57 条
[31]   Learning Multi-View Representation With LSTM for 3-D Shape Recognition and Retrieval [J].
Ma, Chao ;
Guo, Yulan ;
Yang, Jungang ;
An, Wei .
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (05) :1169-1182
[32]  
Ngiam J., 2011, P INT C MACHINE LEAR
[33]   RDEN: Residual Distillation Enhanced Network-Guided Lightweight Synthesized View Quality Enhancement for 3D-HEVC [J].
Pan, Zhaoqing ;
Yuan, Feng ;
Yu, Weijie ;
Lei, Jianjun ;
Ling, Nam ;
Kwong, Sam .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) :6347-6359
[34]   VCRNet: Visual Compensation Restoration Network for No-Reference Image Quality Assessment [J].
Pan, Zhaoqing ;
Yuan, Feng ;
Lei, Jianjun ;
Fang, Yuming ;
Shao, Xiao ;
Kwong, Sam .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :1613-1627
[35]   TSAN: Synthesized View Quality Enhancement via Two-Stream Attention Network for 3D-HEVC [J].
Pan, Zhaoqing ;
Yu, Weijie ;
Lei, Jianjun ;
Ling, Nam ;
Kwong, Sam .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) :345-358
[36]   DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation [J].
Park, Jeong Joon ;
Florence, Peter ;
Straub, Julian ;
Newcombe, Richard ;
Lovegrove, Steven .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :165-174
[37]   LVE-S2D: Low-Light Video Enhancement From Static to Dynamic [J].
Peng, Bo ;
Zhang, Xuanyu ;
Lei, Jianjun ;
Zhang, Zhe ;
Ling, Nam ;
Huang, Qingming .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) :8342-8352
[38]   Deep video action clustering via spatio-temporal feature learning [J].
Peng, Bo ;
Lei, Jianjun ;
Fu, Huazhu ;
Jia, Yalong ;
Zhang, Zongqian ;
Li, Yi .
NEUROCOMPUTING, 2021, 456 :519-527
[39]   Unsupervised Video Action Clustering via Motion-Scene Interaction Constraint [J].
Peng, Bo ;
Lei, Jianjun ;
Fu, Huazhu ;
Zhang, Changqing ;
Chua, Tat-Seng ;
Li, Xuelong .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (01) :131-144
[40]   A Recursive Constrained Framework for Unsupervised Video Action Clustering [J].
Peng, Bo ;
Lei, Jianjun ;
Fu, Huazhu ;
Shao, Ling ;
Huang, Qingming .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (01) :555-565