American Sign language fingerspelling recognition in the wild with spatio temporal feature extraction and multi-task learning

被引：2

作者：

Pannattee, Peerawat ^{[1
]}

Kumwilaisak, Wuttipong ^{[1
]}

Hansakunbuntheung, Chatchawarn ^{[2
]}

Thatphithakkul, Nattanun ^{[2
]}

Kuo, C. -C. Jay ^{[3
]}

机构：

[1] King Mongkuts Univ Technol Thonburi, Dept Elect & Telecommun Engn, Bangkok 10140, Thailand

[2] Natl Sci & Technol Dev Agcy, Assist Technol & Med Devices Res Ctr, Pathum Thani 12120, Thailand

[3] Univ Southern Calif, Ming Hsieh Dept Elect & Comp Engn, Los Angeles, CA 90007 USA

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 243卷

关键词：

Fingerspelling recognition; Variable-filter-length temporal-learning; convolutional neural network; Multi-task learning; Supervised contrastive learning; Joint CTC/attention-based decoding;

D O I：

10.1016/j.eswa.2023.122901

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study introduces a comprehensive approach to enhance the performance of fingerspelling recognition systems in dynamic environments. The methodology begins with spatial feature extraction using MobileNetV3Small, followed by transformation through a projection layer into a latent space. The Variable-Filter-Length Temporal-Learning Convolutional Neural Network (VTCNN) is then applied to extract both short-range and long-range temporal features, providing a robust representation of dynamic gestures. The recognition system incorporates a shared encoder for both the Connectionist Temporal Classification (CTC) decoder and the attention-based decoder, capitalizing on the unique strengths of each decoder. To address weak supervision challenges, a novel strategy involving supervised contrastive learning (SupCon) during retraining is proposed. Leveraging decoding results from the CTC decoder, an image set with frame labels is constructed, contributing to more efficient differentiation between fingerspelling gestures and improving overall accuracy. The final step involves a joint CTC/attention-based decoding strategy using the beam search algorithm. This approach effectively combines decoder outputs, resulting in superior recognition performance. The synergistic interplay of proposed methods-VTCNN for temporal feature extraction, multi-task learning for leveraging decoder strengths, SupCon for feature clustering refinement, and joint decoding-culminates in a holistic and stateof-the-art fingerspelling recognition system, validated through benchmarking on the ChicagoFSWild and ChicagoFSWild+ datasets.

引用

页数：17

共 50 条

[41] Multi-stage multi-task feature learning
Gong, Pinghua
Ye, Jieping
Zhang, Changshui
Journal of Machine Learning Research, 2013, 14 : 2979 - 3010
[42] American Sign Language Words Recognition Using Spatio-Temporal Prosodic and Angle Features: A Sequential Learning Approach
Abdullahi, Sunusi Bala
Chamnongthai, Kosin
IEEE ACCESS, 2022, 10 : 15911 - 15923
[43] Multi-task Attribute Joint Feature Learning
Chang, Lu
Fang, Yuchun
Jiang, Xiaoda
BIOMETRIC RECOGNITION, CCBR 2015, 2015, 9428 : 193 - 200
[44] Multi-task Feature Learning for Social Recommendation
Zhang, Yuanyuan
Sun, Maosheng
Zhang, Xiaowei
Zhang, Yonglong
KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE GRAPH EMPOWERS NEW INFRASTRUCTURE CONSTRUCTION, 2021, 1466 : 240 - 252
[45] Deep Asymmetric Multi-task Feature Learning
Lee, Hae Beom
Yang, Eunho
Hwang, Sung Ju
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[46] Multi-Task Model and Feature Joint Learning
Li, Ya
Tian, Xinmei
Liu, Tongliang
Tao, Dacheng
PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3643 - 3649
[47] Efficient Multi-Task Feature Learning with Calibration
Gong, Pinghua
Zhou, Jiayu
Fan, Wei
Ye, Jieping
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 761 - 770
[48] Group-level spatio-temporal pattern recovery in MEG decoding using multi-task joint feature learning
Kia, Seyed Mostafa
Pedregosa, Fabian
Blumenthal, Anna
Passerini, Andrea
JOURNAL OF NEUROSCIENCE METHODS, 2017, 285 : 97 - 108
[49] Multi-node load forecasting based on multi-task learning with modal feature extraction
Tan, Mao
Hu, Chenglin
Chen, Jie
Wang, Ling
Li, Zhengmao
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 112
[50] Automatic Temporal Relation in Multi-Task Learning
Zhou, Menghui
Yang, Po
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 3570 - 3580

← 1 2 3 4 5 →