American Sign language fingerspelling recognition in the wild with spatio temporal feature extraction and multi-task learning

被引:2
|
作者
Pannattee, Peerawat [1 ]
Kumwilaisak, Wuttipong [1 ]
Hansakunbuntheung, Chatchawarn [2 ]
Thatphithakkul, Nattanun [2 ]
Kuo, C. -C. Jay [3 ]
机构
[1] King Mongkuts Univ Technol Thonburi, Dept Elect & Telecommun Engn, Bangkok 10140, Thailand
[2] Natl Sci & Technol Dev Agcy, Assist Technol & Med Devices Res Ctr, Pathum Thani 12120, Thailand
[3] Univ Southern Calif, Ming Hsieh Dept Elect & Comp Engn, Los Angeles, CA 90007 USA
关键词
Fingerspelling recognition; Variable-filter-length temporal-learning; convolutional neural network; Multi-task learning; Supervised contrastive learning; Joint CTC/attention-based decoding;
D O I
10.1016/j.eswa.2023.122901
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study introduces a comprehensive approach to enhance the performance of fingerspelling recognition systems in dynamic environments. The methodology begins with spatial feature extraction using MobileNetV3Small, followed by transformation through a projection layer into a latent space. The Variable-Filter-Length Temporal-Learning Convolutional Neural Network (VTCNN) is then applied to extract both short-range and long-range temporal features, providing a robust representation of dynamic gestures. The recognition system incorporates a shared encoder for both the Connectionist Temporal Classification (CTC) decoder and the attention-based decoder, capitalizing on the unique strengths of each decoder. To address weak supervision challenges, a novel strategy involving supervised contrastive learning (SupCon) during retraining is proposed. Leveraging decoding results from the CTC decoder, an image set with frame labels is constructed, contributing to more efficient differentiation between fingerspelling gestures and improving overall accuracy. The final step involves a joint CTC/attention-based decoding strategy using the beam search algorithm. This approach effectively combines decoder outputs, resulting in superior recognition performance. The synergistic interplay of proposed methods-VTCNN for temporal feature extraction, multi-task learning for leveraging decoder strengths, SupCon for feature clustering refinement, and joint decoding-culminates in a holistic and stateof-the-art fingerspelling recognition system, validated through benchmarking on the ChicagoFSWild and ChicagoFSWild+ datasets.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Multi-stage multi-task feature learning
    Gong, Pinghua
    Ye, Jieping
    Zhang, Changshui
    Journal of Machine Learning Research, 2013, 14 : 2979 - 3010
  • [42] American Sign Language Words Recognition Using Spatio-Temporal Prosodic and Angle Features: A Sequential Learning Approach
    Abdullahi, Sunusi Bala
    Chamnongthai, Kosin
    IEEE ACCESS, 2022, 10 : 15911 - 15923
  • [43] Multi-task Attribute Joint Feature Learning
    Chang, Lu
    Fang, Yuchun
    Jiang, Xiaoda
    BIOMETRIC RECOGNITION, CCBR 2015, 2015, 9428 : 193 - 200
  • [44] Multi-task Feature Learning for Social Recommendation
    Zhang, Yuanyuan
    Sun, Maosheng
    Zhang, Xiaowei
    Zhang, Yonglong
    KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE GRAPH EMPOWERS NEW INFRASTRUCTURE CONSTRUCTION, 2021, 1466 : 240 - 252
  • [45] Deep Asymmetric Multi-task Feature Learning
    Lee, Hae Beom
    Yang, Eunho
    Hwang, Sung Ju
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [46] Multi-Task Model and Feature Joint Learning
    Li, Ya
    Tian, Xinmei
    Liu, Tongliang
    Tao, Dacheng
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3643 - 3649
  • [47] Efficient Multi-Task Feature Learning with Calibration
    Gong, Pinghua
    Zhou, Jiayu
    Fan, Wei
    Ye, Jieping
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 761 - 770
  • [48] Group-level spatio-temporal pattern recovery in MEG decoding using multi-task joint feature learning
    Kia, Seyed Mostafa
    Pedregosa, Fabian
    Blumenthal, Anna
    Passerini, Andrea
    JOURNAL OF NEUROSCIENCE METHODS, 2017, 285 : 97 - 108
  • [49] Multi-node load forecasting based on multi-task learning with modal feature extraction
    Tan, Mao
    Hu, Chenglin
    Chen, Jie
    Wang, Ling
    Li, Zhengmao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 112
  • [50] Automatic Temporal Relation in Multi-Task Learning
    Zhou, Menghui
    Yang, Po
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 3570 - 3580