DTMMN: Deep transfer multi-metric network for RGB-D action recognition

被引:0
作者
Qin X. [2 ]
Ge Y. [1 ,2 ]
Feng J. [2 ]
Yang D. [1 ,2 ]
Chen F. [1 ,2 ]
Huang S. [1 ,2 ]
Xu L. [1 ,2 ]
机构
[1] Ministry of Education, Key Laboratory of Dependable Service Computing in Cyber Physical Society (Chongqing University), Chongqing
[2] School of Big Data & Software Engineering, Chongqing University, Chongqing
基金
中国国家自然科学基金;
关键词
Multi-metric learning; RGB-D action recognition; Transfer network;
D O I
10.1016/j.neucom.2020.04.034
中图分类号
学科分类号
摘要
In the field of action recognition, fusing multi-modality information is a major research direction. Although many state-of-the-art methods have emerged recently, they lack consideration for the problem of modality deficiency and modality distribution. This paper proposes a novel two-stage method which named deep transfer multi-metric network (DTMMN) to investigate these two questions. First, we train a transfer network to learn the mapping from RGB to depth modality for addressing modality deficiency, and then we use the multi-metric learning to get the relation and differences of feature distribution for actions recognition. We experiment on two challenge datasets and our approach outperforms the state-of-the-art methods on both datasets. © 2020 Elsevier B.V.
引用
收藏
页码:127 / 134
页数:7
相关论文
共 50 条
[1]  
Wang H., Klaser A., Schmid C., Liu C.-L., Dense trajectories and motion boundary descriptors for action recognition, Int. J. Comput. Vis., 103, 1, pp. 60-79, (2013)
[2]  
Simonyan K., Zisserman A., Two-stream convolutional networks for action recognition in videos, Proceedings of the Advances in neural information processing systems, pp. 568-576, (2014)
[3]  
Tran D., Bourdev L., Fergus R., Torresani L., Paluri M., Learning spatiotemporal features with 3d convolutional networks, Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497, (2015)
[4]  
Ke Q., Bennamoun M., An S., Sohel F., Boussaid F., A new representation of skeleton sequences for 3d action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3288-3297, (2017)
[5]  
Huang Z., Wan C., Probst T., Van Gool L., Deep learning on lie groups for skeleton-based action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6099-6108, (2017)
[6]  
Li C., Cui Z., Zheng W., Xu C., Yang J., Spatio-temporal graph convolution for skeleton based action recognition, Proceedings of the PThirty-Second AAAI Conference on Artificial Intelligence, (2018)
[7]  
Hu J.-F., Zheng W.-S., Lai J., Zhang J., Jointly learning heterogeneous features for RGB-d activity recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5344-5352, (2015)
[8]  
Kong Y., Fu Y., Discriminative relational representation learning for RGB-d action recognition, IEEE Trans. Image Process., 25, 6, pp. 2856-2865, (2016)
[9]  
Shahroudy A., Ng T.-T., Gong Y., Wang G., Deep multimodal feature analysis for action recognition in rgb+d videos, IEEE Trans. Pattern Anal. Mach. Intell., 40, 5, pp. 1045-1058, (2017)
[10]  
Wang P., Li W., Gao Z., Zhang Y., Tang C., Ogunbona P., Scene flow to action map: A new representation for rgb-d based action recognition with convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595-604, (2017)