DTMMN: Deep transfer multi-metric network for RGB-D action recognition

被引：0

作者：

Qin X. ^{[2
]}

Ge Y. ^{[1
,2
]}

Feng J. ^{[2
]}

Yang D. ^{[1
,2
]}

Chen F. ^{[1
,2
]}

Huang S. ^{[1
,2
]}

Xu L. ^{[1
,2
]}

机构：

[1] Ministry of Education, Key Laboratory of Dependable Service Computing in Cyber Physical Society (Chongqing University), Chongqing

[2] School of Big Data & Software Engineering, Chongqing University, Chongqing

来源：

Neurocomputing | 2021年 / 406卷

基金：

中国国家自然科学基金;

关键词：

Multi-metric learning; RGB-D action recognition; Transfer network;

D O I：

10.1016/j.neucom.2020.04.034

中图分类号：

学科分类号：

摘要：

In the field of action recognition, fusing multi-modality information is a major research direction. Although many state-of-the-art methods have emerged recently, they lack consideration for the problem of modality deficiency and modality distribution. This paper proposes a novel two-stage method which named deep transfer multi-metric network (DTMMN) to investigate these two questions. First, we train a transfer network to learn the mapping from RGB to depth modality for addressing modality deficiency, and then we use the multi-metric learning to get the relation and differences of feature distribution for actions recognition. We experiment on two challenge datasets and our approach outperforms the state-of-the-art methods on both datasets. © 2020 Elsevier B.V.

引用

页码：127 / 134

页数：7

共 50 条

[1]

Wang H., Klaser A., Schmid C., Liu C.-L., Dense trajectories and motion boundary descriptors for action recognition, Int. J. Comput. Vis., 103, 1, pp. 60-79, (2013)

[2]

Simonyan K., Zisserman A., Two-stream convolutional networks for action recognition in videos, Proceedings of the Advances in neural information processing systems, pp. 568-576, (2014)

[3]

Tran D., Bourdev L., Fergus R., Torresani L., Paluri M., Learning spatiotemporal features with 3d convolutional networks, Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497, (2015)

[4]

Ke Q., Bennamoun M., An S., Sohel F., Boussaid F., A new representation of skeleton sequences for 3d action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3288-3297, (2017)

[5]

Huang Z., Wan C., Probst T., Van Gool L., Deep learning on lie groups for skeleton-based action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6099-6108, (2017)

[6]

Li C., Cui Z., Zheng W., Xu C., Yang J., Spatio-temporal graph convolution for skeleton based action recognition, Proceedings of the PThirty-Second AAAI Conference on Artificial Intelligence, (2018)

[7]

Hu J.-F., Zheng W.-S., Lai J., Zhang J., Jointly learning heterogeneous features for RGB-d activity recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5344-5352, (2015)

[8]

Kong Y., Fu Y., Discriminative relational representation learning for RGB-d action recognition, IEEE Trans. Image Process., 25, 6, pp. 2856-2865, (2016)

[9]

Shahroudy A., Ng T.-T., Gong Y., Wang G., Deep multimodal feature analysis for action recognition in rgb+d videos, IEEE Trans. Pattern Anal. Mach. Intell., 40, 5, pp. 1045-1058, (2017)

[10]

Wang P., Li W., Gao Z., Zhang Y., Tang C., Ogunbona P., Scene flow to action map: A new representation for rgb-d based action recognition with convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595-604, (2017)

← 1 2 3 4 5 →