Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis

被引:1
作者
Zhang, Yufeng [1 ,2 ]
Kang, Wenxiong [1 ,3 ,4 ]
Song, Wenwei [5 ]
机构
[1] South China Univ Technol, Sch Automat Sci & Engn, Guangzhou 510641, Peoples R China
[2] Pazhou Lab, Guangzhou 510335, Peoples R China
[3] South China Univ Technol, Sch Future Technol, Guangzhou 510641, Peoples R China
[4] Pazhou Lab, Guangzhou 510335, Peoples R China
[5] Southwest Jiaotong Univ, Sch Phys Sci & Technol, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
Authentication; Videos; Feature extraction; Physiology; Robustness; Lighting; Spatiotemporal phenomena; Biometrics; hand gesture authentication; multimodal fusion; spatiotemporal analysis; behavioral characteristic representation; NEURAL-NETWORKS; VERIFICATION; GEOMETRY; TERM;
D O I
10.1109/TIFS.2024.3451367
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net.
引用
收藏
页码:8630 / 8643
页数:14
相关论文
共 70 条
[1]   AirAuth: Evaluating In-Air Hand Gestures for Authentication [J].
Aumi, Md Tanvir Islam ;
Kratz, Sven .
PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION WITH MOBILE DEVICES AND SERVICES (MOBILEHCI'14), 2014, :309-318
[2]  
Ballas N., 2016, P 4 INT C LEARN REPR, P1
[3]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[4]   GaitSet: Cross-View Gait Recognition Through Utilizing Gait As a Deep Set [J].
Chao, Hanqing ;
Wang, Kun ;
He, Yiwei ;
Zhang, Junping ;
Feng, Jianfeng .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) :3467-3478
[5]   Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition [J].
Cheng, Jun ;
Ren, Ziliang ;
Zhang, Qieshi ;
Gao, Xiangyang ;
Hao, Fusheng .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) :1498-1509
[6]   Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [J].
Donahue, Jeff ;
Hendricks, Lisa Anne ;
Rohrbach, Marcus ;
Venugopalan, Subhashini ;
Guadarrama, Sergio ;
Saenko, Kate ;
Darrell, Trevor .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :677-691
[7]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[8]   Two-frame motion estimation based on polynomial expansion [J].
Farnebäck, G .
IMAGE ANALYSIS, PROCEEDINGS, 2003, 2749 :363-370
[9]   SlowFast Networks for Video Recognition [J].
Feichtenhofer, Christoph ;
Fan, Haoqi ;
Malik, Jitendra ;
He, Kaiming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210
[10]   On Joint Optimization of Automatic Speaker Verification and Anti-Spoofing in the Embedding Space [J].
Gomez-Alanis, Alejandro ;
Gonzalez-Lopez, Jose A. ;
Pavankumar Dubagunta, S. ;
Peinado, Antonio M. ;
Magimai-Doss, Mathew .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 :1579-1593