Dynamic Gesture Recognition Based on Two-Scale 3-D-ConvNeXt

被引：1

作者：

Hao, Sida ^{[1
]}

Fu, Min ^{[1
]}

Liu, Xuefeng ^{[2
]}

Zheng, Bing ^{[1
]}

机构：

[1] Ocean Univ China, Sanya Oceanog Inst, Sanya 572024, Peoples R China

[2] Qingdao Univ Sci & Technol, Coll Automat & Elect Engn, Qingdao 266100, Peoples R China

来源：

IEEE SENSORS JOURNAL | 2023年 / 23卷 / 23期

关键词：

ConvNeXt; gesture recognition; human-computer interaction; spatiotemporal information;

D O I：

10.1109/JSEN.2023.3324479

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

As a straightforward method for human-machine interaction, gesture recognition is vital in many practical applications. However, effectively extracting spatiotemporal information from video is still a fundamental problem and designing an accurate and efficient network is a feasible solution. The ConvNeXt, renowned for its superior still image processing capabilities, is chosen as the basis of this work. Then, the network is extended to a 3-D pattern for dynamic data and a two-scale convolution kernel is introduced to focus on the hand region. Therefore, a novel two-scale 3-D-ConvNeXt network (TS3C-Net) is established. Furthermore, the Mixup, Cutmix data augmentation, and label smoothing regularization are also applied to enhance the performance further. The experiments show that the accuracy of the proposed TS3C-Net achieves 95.36%, 97.1%, and 87.55% on EgoGesture, Jester, and NVGesture datasets, respectively.

引用

页码：29227 / 29234

页数：8

共 50 条

[1] Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition with Multimodal Training [J].

Abavisani, Mahdi ;

Joze, Hamid Reza Vaezi ;

Patel, Vishal M. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1165-1174

[2]

Abhishek KS, 2016, IEEE C ELEC DEVICES, P334, DOI 10.1109/EDSSC.2016.7785276

[3]

Ba JL, 2016, arXiv

[4]

Cao Y., 2023, IEEE Syst. J

[5] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[6] Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition [J].

Chen, Chun-Fu ;

Panda, Rameswar ;

Ramakrishnan, Kandan ;

Feris, Rogerio ;

Cohn, John ;

Oliva, Aude ;

Fan, Quanfu .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6161-6171

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8] Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [J].

Ding, Xiaohan ;

Zhang, Xiangyu ;

Han, Jungong ;

Ding, Guiguang .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11953-11965

[9]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[10] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

← 1 2 3 4 5 →