American Sign language fingerspelling recognition in the wild with spatio temporal feature extraction and multi-task learning

被引：4

作者：

Pannattee, Peerawat ^{[1
]}

Kumwilaisak, Wuttipong ^{[1
]}

Hansakunbuntheung, Chatchawarn ^{[2
]}

Thatphithakkul, Nattanun ^{[2
]}

Kuo, C. -C. Jay ^{[3
]}

机构：

[1] King Mongkuts Univ Technol Thonburi, Dept Elect & Telecommun Engn, Bangkok 10140, Thailand

[2] Natl Sci & Technol Dev Agcy, Assist Technol & Med Devices Res Ctr, Pathum Thani 12120, Thailand

[3] Univ Southern Calif, Ming Hsieh Dept Elect & Comp Engn, Los Angeles, CA 90007 USA

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 243卷

关键词：

Fingerspelling recognition; Variable-filter-length temporal-learning; convolutional neural network; Multi-task learning; Supervised contrastive learning; Joint CTC/attention-based decoding;

D O I：

10.1016/j.eswa.2023.122901

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study introduces a comprehensive approach to enhance the performance of fingerspelling recognition systems in dynamic environments. The methodology begins with spatial feature extraction using MobileNetV3Small, followed by transformation through a projection layer into a latent space. The Variable-Filter-Length Temporal-Learning Convolutional Neural Network (VTCNN) is then applied to extract both short-range and long-range temporal features, providing a robust representation of dynamic gestures. The recognition system incorporates a shared encoder for both the Connectionist Temporal Classification (CTC) decoder and the attention-based decoder, capitalizing on the unique strengths of each decoder. To address weak supervision challenges, a novel strategy involving supervised contrastive learning (SupCon) during retraining is proposed. Leveraging decoding results from the CTC decoder, an image set with frame labels is constructed, contributing to more efficient differentiation between fingerspelling gestures and improving overall accuracy. The final step involves a joint CTC/attention-based decoding strategy using the beam search algorithm. This approach effectively combines decoder outputs, resulting in superior recognition performance. The synergistic interplay of proposed methods-VTCNN for temporal feature extraction, multi-task learning for leveraging decoder strengths, SupCon for feature clustering refinement, and joint decoding-culminates in a holistic and stateof-the-art fingerspelling recognition system, validated through benchmarking on the ChicagoFSWild and ChicagoFSWild+ datasets.

引用

页数：17

共 63 条

[1]

Abner N. R, 2012, There once was a verb: The predicative core of possessive and nominalization structures in American Sign Language

[2] Turkish fingerspelling recognition system using Generalized Hough Transform, interest regions, and local descriptors [J].

Altun, Oguz ;

Albayrak, Songul .

PATTERN RECOGNITION LETTERS, 2011, 32 (13) :1626-1632

[3] A convolutional neural network to classify American Sign Language fingerspelling from depth and colour images [J].

Ameen, Salem ;

Vadera, Sunil .

EXPERT SYSTEMS, 2017, 34 (03)

[4]

Arican E, 2022, ROM J INF SCI TECH, V25, P338

[5]

Aryanie D, 2015, 2015 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), P533, DOI 10.1109/ICoICT.2015.7231481

[6] Improvement of K-means Cluster Quality by Post Processing Resulted Clusters [J].

Borlea, Ioan-Daniel ;

Precup, Radu-Emil ;

Borlea, Alexandra-Bianca .

8TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2020 & 2021): DEVELOPING GLOBAL DIGITAL ECONOMY AFTER COVID-19, 2022, 199 :63-70

[7] 3D Hand Shape and Pose from Images in the Wild [J].

Boukhayma, Adnane ;

de Bem, Rodrigo ;

Torr, Philip H. S. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10835-10844

[8]

Chorowski J, 2015, Arxiv, DOI arXiv:1506.07503

[9]

Chung JY, 2014, Arxiv, DOI arXiv:1412.3555

[10] Gyro motor fault classification model based on a coupled hidden Markov model with a minimum intra-class distance algorithm [J].

Dong, Lei ;

Li, Wei-min ;

Wang, Ching-Hsin ;

Lin, Kuo-Ping .

PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART I-JOURNAL OF SYSTEMS AND CONTROL ENGINEERING, 2020, 234 (05) :646-661

← 1 2 3 4 5 6 7 →