HMTN: Hierarchical Multi-scale Transformer Network for 3D Shape Recognition

被引：3

作者：

Zhao, Yue ^{[1
,2
]}

Nie, Weizhi ^{[1
]}

Gao, Zan ^{[3
]}

Liu, An-an ^{[1
,2
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China

[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China

[3] Shandong Artificial Intelligence Inst, Jinan, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

3D Shape Recognition; Transformer; Hierarchical Network;

D O I：

10.1145/3503161.3548140

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

As an important field of multimedia, 3D shape recognition has attracted much research attention in recent years. Various approaches have been proposed, within which the multiview-based methods show their promising performances. In general, an effective 3D shape recognition algorithm should take both the multiview local and global visual information into consideration, and explore the inherent properties of generated 3D descriptors to guarantee the performance of feature alignment in the common space. To tackle these issues, we propose a novel Hierarchical Multi-scale Transformer Network (HMTN) for the 3D shape recognition task. In HMTN, we propose a multi-level regional transformer (MLRT) module for shape descriptor generation. MLRT includes two branches that aim to extract the intra-view local characteristics by modeling region-wise dependencies and give the supervision of multiview global information under different granularities. Specifically, MLRT can comprehensively consider the relations of different regions and focus on the discriminative parts, which improves the effectiveness of the learned descriptors. Finally, we adopt the cross-granularity contrastive learning (CCL) mechanism for shape descriptor alignment in the common space. It can explore and utilize the cross-granularity semantic correlation to guide the descriptor extraction process while performing the instance alignment based on the category information. We evaluate the proposed network on several public benchmarks, and HMTN achieves competitive performance compared with the state-of-the-art (SOTA) methods.

引用

页数：9

共 50 条

[31] Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation [J].

Zhou, Kangkang ;

Zhang, Lijun ;

Lu, Feng ;

Zhou, Xiang-Dong ;

Shi, Yu .

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :7512-7520

[32] Mixed-Type Wafer Defect Recognition With Multi-Scale Information Fusion Transformer [J].

Wei, Yuxiang ;

Wang, Huan .

IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2022, 35 (02) :341-352

[33] Multi-TranResUnet: An Improved Transformer Network for Solving Multi-Scale Issues in Image Segmentation [J].

Kang, Yajing ;

Cheng, Shuai ;

Guo, Liang ;

Zheng, Chao ;

Zhao, Jizhuang .

IEEE ACCESS, 2024, 12 :129000-129011

[34] Multi-Scale Adaptive Skeleton Transformer for action [J].

Wang, Xiaotian ;

Chen, Kai ;

Zhao, Zhifu ;

Shi, Guangming ;

Xie, Xuemei ;

Jiang, Xiang ;

Yang, Yifan .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 250

[35] SKETCH-BASED 3D SHAPE RETRIEVAL WITH MULTI-VIEW FUSION TRANSFORMER [J].

Zhu, Cunjuan ;

Cui, Dongdong ;

Jia, Qi ;

Wang, Weimin ;

Liu, Yu ;

Lew, Michael S. .

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, :3005-3009

[36] Multi-hop graph transformer network for 3D human pose estimation [J].

Islam, Zaedul ;

Ben Hamza, A. .

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101

[37] LATFormer: Locality-Aware Point-View Fusion Transformer for 3D shape recognition [J].

He, Xinwei ;

Cheng, Silin ;

Liang, Dingkang ;

Bai, Song ;

Wang, Xi ;

Zhu, Yingying .

PATTERN RECOGNITION, 2024, 151

[38] A Multi-Scale Transformer Fusion Deep Clustering Network for Unsupervised Planetary Change Detection [J].

Jia, Yutong ;

Wan, Gang ;

Liu, Jia ;

Zhao, Chenxu ;

Wang, Guoping ;

Zhang, Yifan ;

Liu, Lei ;

Xie, Bin .

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 :1-5

[39] An Intrusion Detection Method for Industrial Internet Fusing Multi-Scale TCN and Transformer Network [J].

Liu, Zhihua ;

Liu, Shenquan ;

Zhang, Jian .

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IX, ICIC 2024, 2024, 14870 :82-96

[40] Multi-Scale Transformer-CNN Network for Brain Tumor Segmentation and Survival Prediction [J].

Mazumdar, Indrajit ;

Mukhopadhyay, Jayanta .

PROCEEDINGS OF FIFTEENTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING, ICVGIP 2024, 2024,

← 1 2 3 4 5 →