Learning an Augmented RGB Representation for Dynamic Hand Gesture Authentication

被引：0

作者：

Xie, Huilong ^{[1
,2
]}

Song, Wenwei ^{[1
,2
]}

Kang, Wenxiong ^{[1
,2
,3
]}

机构：

[1] South China Univ Technol, Sch Automat Sci & Engn, Guangzhou 510641, Peoples R China

[2] Pazhou Lab, Guangzhou 510335, Peoples R China

[3] South China Univ Technol, Sch Future Technol, Guangzhou 510641, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Authentication; Optical flow; Physiology; Knowledge engineering; Feature extraction; Training; Task analysis; Hand gesture authentication; feature fusion; augmented representation; multi-modal information; cross-modal knowledge distillation; GAIT RECOGNITION; DISTILLATION; SYSTEM;

D O I：

10.1109/TCSVT.2024.3398624

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Dynamic hand gesture authentication aims to recognize users' identity through the characteristics of their hand gestures. How to extract favorable features for verification is the key to success. Cross-modal knowledge distillation is an intuitive approach that can introduce additional modality information in the training phase to enhance the target modality representation, improving model performance without incurring additional computation in the inference phase. However, most previous cross-modal knowledge distillation methods directly transfer information from one modality to another one without considering the modality gap. In this paper, we propose a novel translation mechanism in cross-modal knowledge distillation that can effectively mitigate the modality gap and utilize the information from the additional modality to enhance the target modality representation. In order to better transfer modality information, we propose a novel modality fusion-enhanced non-local (MFENL) module, which can fuse the multi-modal information from the teacher network and enhance the fused features based on the modality input into the student network. We use cascaded MFENL modules as the translator based on the proposed cross-modal knowledge distillation method to learn an enhanced RGB representation for dynamic hand gesture authentication. Extensive experiments on the SCUT-DHGA dataset demonstrate that our method has compelling advantages over the state-of-the-art methods. The code is available at https://github.com/SCUT-BIP-Lab/TranslationCKD.

引用

页码：9195 / 9208

页数：14

共 72 条

[1]

Aslan I., 2014, P 16 INT C MULTIMODA, P311

[2] AirAuth: Evaluating In-Air Hand Gestures for Authentication [J].

Aumi, Md Tanvir Islam ;

Kratz, Sven .

PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION WITH MOBILE DEVICES AND SERVICES (MOBILEHCI'14), 2014, :309-318

[3] Evaluation of Deep Learning Models for Person Authentication Based on Touch Gesture [J].

Bajaber, Asrar ;

Fadel, Mai ;

Elrefaei, Lamiaa .

COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2022, 42 (02) :465-481

[4] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[5] Distilling Knowledge via Knowledge Review [J].

Chen, Pengguang ;

Liu, Shu ;

Zhao, Hengshuang ;

Jia, Jiaya .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5006-5015

[6] Accurate iris segmentation and recognition using an end-to-end unified framework based on MADNet and DSANet [J].

Chen, Ying ;

Gan, Huimin ;

Chen, Huiling ;

Zeng, Yugang ;

Xu, Liang ;

Heidari, Ali Asghar ;

Zhu, Xiaodong ;

Liu, Yuanning .

NEUROCOMPUTING, 2023, 517 :264-278

[7] MARS: Motion-Augmented RGB Stream for Action Recognition [J].

Crasto, Nieves ;

Weinzaepfel, Philippe ;

Alahari, Karteek ;

Schmid, Cordelia .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7874-7883

[8] Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection [J].

Dai, Rui ;

Das, Srijan ;

Bremond, Francois .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13033-13044

[9] Two-frame motion estimation based on polynomial expansion [J].

Farnebäck, G .

IMAGE ANALYSIS, PROCEEDINGS, 2003, 2749 :363-370

[10] Deep Adaptive Fusion Network for High Performance RGBT Tracking [J].

Gao, Yuan ;

Li, Chenglong ;

Zhu, Yabin ;

Tang, Jin ;

He, Tao ;

Wang, Futian .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :91-99

← 1 2 3 4 5 6 7 8 →