Cross-Sentence Gloss Consistency for Continuous Sign Language Recognition

被引:0
作者
Rao, Qi [1 ]
Sun, Ke [2 ]
Wang, Xiaohan [3 ]
Wang, Qi [2 ]
Zhang, Bang [2 ]
机构
[1] Univ Technol Sydney, ReLER, AAII, Ultimo, NSW, Australia
[2] Alibaba Grp, Inst Intelligent Comp, Hangzhou, Peoples R China
[3] Stanford Univ, Stanford, CA USA
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5 | 2024年
关键词
FRAMEWORK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Continuous sign language recognition (CSLR) aims to recognize gloss sequences from continuous sign videos. Recent works enhance the gloss representation consistency by mining correlations between visual and contextual modules within individual sentences. However, there still remain much richer correlations among glosses across different sentences. In this paper, we present a simple yet effective Cross-Sentence Gloss Consistency (CSGC), which enforces glosses belonging to a same category to be more consistent in representation than those belonging to different categories, across all training sentences. Specifically, in CSGC, a prototype is maintained for each gloss category and benefits the gloss discrimination in a contrastive way. Thanks to the well-distinguished gloss prototype, an auxiliary similarity classifier is devised to enhance the recognition clues, thus yield-ing more accurate results. Extensive experiments conducted on three CSLR datasets show that our proposed CSGC significantly boosts the performance of CSLR, surpassing existing state-of-the-art works by large margins (i.e., 1.6% on PHOENIX14, 2.4% on PHOENIX14-T, and 5.7% on CSL-Daily).
引用
收藏
页码:4650 / 4658
页数:9
相关论文
共 37 条
  • [1] AAMODT A, 1994, AI COMMUN, V7, P39
  • [2] SubUNets: End-to-end Hand Shape and Continuous Sign Language Recognition
    Camgoz, Necati Cihan
    Hadfield, Simon
    Koller, Oscar
    Bowden, Richard
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3075 - 3084
  • [3] Camgöz NC, 2020, PROC CVPR IEEE, P10020, DOI 10.1109/CVPR42600.2020.01004
  • [4] Neural Sign Language Translation
    Camgoz, Necati Cihan
    Hadfield, Simon
    Koller, Oscar
    Ney, Hermann
    Bowden, Richard
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7784 - 7793
  • [5] A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training
    Cui, Runpeng
    Liu, Hu
    Zhang, Changshui
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (07) : 1880 - 1891
  • [6] Duda R. O., 1973, Pattern clas-sification and scene analysis, V3
  • [7] Freeman W.T., 1995, INT WORKSH AUT FAC G, V12, P296
  • [8] Gao W, 2004, PATTERN RECOGN, V37, P2389, DOI 10.1016/j.patcog.2004.04.008
  • [9] Graves A., 2006, INT C MACHINE LEARNI, P369, DOI DOI 10.1145/1143844.1143891
  • [10] Hadsell R., 2006, 2006 IEEE COMP SOC C, V2, P1735