SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention

被引:172
作者
Han, Zhizhong [1 ,2 ]
Shang, Mingyang [1 ]
Liu, Zhenbao [3 ]
Vong, Chi-Man [5 ]
Liu, Yu-Shen [1 ]
Zwicker, Matthias [6 ]
Han, Junwei [4 ]
Chen, C. L. Philip [5 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China
[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20737 USA
[3] Northwestern Polytech Univ, Sch Aeronaut, Xian 710072, Shaanxi, Peoples R China
[4] Northwestern Polytech Univ, Sch Automat, Xian 710072, Shaanxi, Peoples R China
[5] Univ Macau, Dept Comp & Informat Sci, Macau 99999, Peoples R China
[6] Univ Maryland, College Pk, MD 20737 USA
基金
瑞士国家科学基金会; 中国国家自然科学基金;
关键词
3D feature learning; sequential views; sequential labels; view aggregation; RNN; attention; NEURAL-NETWORK;
D O I
10.1109/TIP.2018.2868426
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning 3D global features by aggregating multiple views has been introduced as a successful strategy for 3D shape analysis. In recent deep learning models with end-to-end training, pooling is a widely adopted procedure for view aggregation. However, pooling merely retains the max or mean value over all views, which disregards the content information of almost all views and also the spatial information among the views. To resolve these issues, we propose Sequential Views To Sequential Labels (SeqViews2SeqLabels) as a novel deep learning model with an encoder-decoder structure based on recurrent neural networks (RNNs) with attention. SeqViews2SeqLabels consists of two connected parts, an encoder-RNN followed by a decoder-RNN, that aim to learn the global features by aggregating sequential views and then performing shape classification from the learned global features, respectively. Specifically, the encoder-RNN learns the global features by simultaneously encoding the spatial and content information of sequential views, which captures the semantics of the view sequence. With the proposed prediction of sequential labels, the decoder-RNN performs more accurate classification using the learned global features by predicting sequential labels step by step. Learning to predict sequential labels provides more and finer discriminative information among shape classes to learn, which alleviates the overfitting problem inherent in training using a limited number of 3D shapes. Moreover, we introduce an attention mechanism to further improve the discriminative ability of SeqViews2SeqLabels. This mechanism increases the weight of views that are distinctive to each shape class, and it dramatically reduces the effect of selecting the first view position. Shape classification and retrieval results under three large-scale benchmarks verify that SeqViews2SeqLabels learns more discriminative global features by more effectively aggregating sequential views than state-of-the-art methods.
引用
收藏
页码:658 / 672
页数:15
相关论文
共 54 条
[1]  
[Anonymous], 2015, ICLR
[2]  
[Anonymous], 2018, P IEEE C COMP VIS PA
[3]  
[Anonymous], 2017, P EUR WORKSH 3D OBJ
[4]  
[Anonymous], EUR WORKSH 3D OBJ RE
[5]  
[Anonymous], 2016, ADV NEURAL INFORM PR
[6]  
[Anonymous], ACM T GRAPH
[7]  
[Anonymous], 2018, P IEEE C COMP VIS PA
[8]  
[Anonymous], 2015, JMLR P
[9]  
[Anonymous], P EUR C COMPUT VIS
[10]  
Bahdanau D., 2015, P INT C LEARN REPR, P1