GaitRA: triple-branch multimodal gait recognition with larger effective receptive fields and mixed attention

被引:0
作者
Xue L. [1 ]
Tao Z. [1 ]
机构
[1] Software College, Hebei Normal University, Shijiazhuang
关键词
Effective receptive field; Gait recognition; Motion pattern; Multimodal;
D O I
10.1007/s11042-024-19596-9
中图分类号
学科分类号
摘要
Gait Recognition, as a long-distance biometric technique for identity recognition, has attracted widespread attention in recent years. Previous academia typically employs minor convolutional networks to extract single-modal features from the silhouette or the joint skeleton. Nevertheless, silhouette-based methods are susceptible to clothing variations, while skeleton-based methods encounter the issue of missing physique information. Therefore, we propose a novel multimodal triple-branch network dubbed GaitRA to comprehensively acquire gait features from two aspects of silhouette and skeleton simultaneously. GaitRA consists of three branches: a 3D-CNN branch to extract the primary features of silhouette sequences, a 2D-CNN branch to obtain the secondary features of silhouette sequences, and a branch based on Spatio-Temporal Graph Convolution (ST-GCN) to gain joint skeleton features. More importantly, we innovatively introduce the RepLK-ACTION Module, which combines the RepLK Block based on the Swin Transformer and the ACTION Module based on a mixed attention mechanism from the Action Recognition. RepLK-ACTION Module establishes larger Effective Receptive Fields (ERFs) through the large-kernel re-parameterized convolution to attain more discriminative multimodal gait information, thereby enhancing the model performance under complex walking conditions. Experiments have demonstrated that GaitRA significantly improves performance on both CASIA-B and Gait3D datasets. Especially, the proposed method achieves excellent results of 73.1% (Rank-1), 86.4% (Rank-5), 64.3% (mAP), and 36.2% (mINP) on the Gait3D dataset. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:80225 / 80259
页数:34
相关论文
共 59 条
[21]  
Yu S., Tan D., Tan T., A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition, 18th international conference on pattern recognition (ICPR'06), pp. 441-444, (2006)
[22]  
Zheng J., Liu X., Liu W., He L., Yan C., Mei T., Gait recognition in the wild with dense 3d representations and a benchmark, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20228-20237, (2022)
[23]  
Ding X., Zhang X., Han J., Ding G., Scaling up your kernels to 31x31: revisiting large kernel design in cnns, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11963-11975, (2022)
[24]  
Luo W., Li Y., Urtasun R., Zemel R., Understanding the effective receptive field in deep convolutional neural networks, Proceedings of the 30Th International Conference on Neural Information Processing Systems, pp. 4905-4913, (2016)
[25]  
Yan S., Xiong Y., Lin D., Spatial temporal graph convolutional networks for skeleton-based action recognition, Proceedings of the AAAI conference on artificial intelligence, (2018)
[26]  
Wang Z., She Q., Smolic A., Action-net: multipath excitation for action recognition, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13214-13223, (2021)
[27]  
Zhu H., Zheng W., Zheng Z., Nevatia R., GaitRef: Gait recognition with refined sequential skeletons, Proceedings of the 2023 IEEE International Joint Conference on Biometrics (IJCB), pp. 1-10, (2023)
[28]  
Liu Z., Lin Y., Cao Y., Hu H., Wei Y., Zhang Z., Lin S., Guo B., Swin transformer: hierarchical vision transformer using shifted windows, IEEE/CVF Int Conf Comput Vis, 2021, pp. 10012-10022, (2021)
[29]  
Chollet F., Xception: deep learning with depthwise separable convolutions, IEEE Conf Comput Vis Patt Recogn, 2017, pp. 1251-1258, (2017)
[30]  
Zhao G., Liu G., Li H., Pietikainen M., 3D gait recognition using multiple cameras, 7th international conference on automatic face and gesture recognition (FGR06), pp. 529-534, (2006)