MHSAN: Multi-view hierarchical self-attention network for 3D shape recognition

被引：5

作者：

Cao, Jiangzhong ^{[1
]}

Yu, Lianggeng ^{[1
]}

Ling, Bingo Wing-Kuen ^{[1
]}

Yao, Zijie ^{[1
]}

Dai, Qingyun ^{[2
,3
]}

机构：

[1] Guangdong Univ Technol, Sch Informat Engn, Guangzhou 510006, Peoples R China

[2] Guangdong Polytech Normal Univ, Guangzhou 510665, Peoples R China

[3] Guangdong Prov Key Lab Intellectual Property & Big, Guangzhou 510665, Peoples R China

来源：

PATTERN RECOGNITION | 2024年 / 150卷

基金：

中国国家自然科学基金;

关键词：

3D shape recognition; Self-attention; Multi-view learning; View aggregation;

D O I：

10.1016/j.patcog.2024.110315

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-view learning has demonstrated promising performance for 3D shape recognition. However, existing multi-view methods usually focus on fusing multiple views and ignore the structural and discriminative information carried by 2D views. In this paper, we propose a multi-view hierarchical self-attention network (MHSAN) to explore the geometric and discriminative information from complex 2D views. Specifically, MHSAN consists of two self-attention networks. First, a global self-attention network is adopted to exploit the structure information by embedding position information of views. Then, the discriminative self-attention network learns discriminative information from the views with high classification scores. Through the proposed MHSAN, the geometric and discriminative information is condensed as the novel representation of 3D shapes. To validate the effectiveness of our proposed method, extensive experiments have been conducted on three 3D shape benchmarks. Experimental results demonstrate that our method is generally superior to the state-of-the-art methods in 3D shape classification and retrieval tasks.

引用

页数：12

共 40 条

[1]

Abnar S, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4190

[2]

Brock A., 2016, Comput. Sci.

[3]

Chen S, 2021, Arxiv, DOI arXiv:2110.13083

[4] VERAM: View-Enhanced Recurrent Attention Model for 3D Shape Classification [J].

Chen, Songle ;

Zheng, Lintao ;

Zhang, Yan ;

Sun, Zhixin ;

Xu, Kai .

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2019, 25 (12) :3244-3257

[5]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[6]

Dosovitskiy A, 2021, INT C LEARN REPR ICL

[7] GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition [J].

Feng, Yifan ;

Zhang, Zizhao ;

Zhao, Xibin ;

Ji, Rongrong ;

Gao, Yue .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :264-272

[8]

Gao Z, 2018, AAAI CONF ARTIF INTE, P2223

[9] 3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation [J].

Han, Zhizhong ;

Lu, Honglei ;

Liu, Zhenbao ;

Vong, Chi-Man ;

Liu, Yu-Shen ;

Zwicker, Matthias ;

Han, Junwei ;

Chen, C. L. Philip .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (08) :3986-3999

[10] View N-gram Network for 3D Object Retrieval [J].

He, Xinwei ;

Huang, Tengteng ;

Bai, Song ;

Bai, Xiang .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7514-7523

← 1 2 3 4 →