Cross-Sentence Gloss Consistency for Continuous Sign Language Recognition

被引：0

作者：

Rao, Qi ^{[1
]}

Sun, Ke ^{[2
]}

Wang, Xiaohan ^{[3
]}

Wang, Qi ^{[2
]}

Zhang, Bang ^{[2
]}

机构：

[1] Univ Technol Sydney, ReLER, AAII, Ultimo, NSW, Australia

[2] Alibaba Grp, Inst Intelligent Comp, Hangzhou, Peoples R China

[3] Stanford Univ, Stanford, CA USA

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5 | 2024年

关键词：

FRAMEWORK;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Continuous sign language recognition (CSLR) aims to recognize gloss sequences from continuous sign videos. Recent works enhance the gloss representation consistency by mining correlations between visual and contextual modules within individual sentences. However, there still remain much richer correlations among glosses across different sentences. In this paper, we present a simple yet effective Cross-Sentence Gloss Consistency (CSGC), which enforces glosses belonging to a same category to be more consistent in representation than those belonging to different categories, across all training sentences. Specifically, in CSGC, a prototype is maintained for each gloss category and benefits the gloss discrimination in a contrastive way. Thanks to the well-distinguished gloss prototype, an auxiliary similarity classifier is devised to enhance the recognition clues, thus yield-ing more accurate results. Extensive experiments conducted on three CSLR datasets show that our proposed CSGC significantly boosts the performance of CSLR, surpassing existing state-of-the-art works by large margins (i.e., 1.6% on PHOENIX14, 2.4% on PHOENIX14-T, and 5.7% on CSL-Daily).

引用

页码：4650 / 4658

页数：9

共 37 条

[1] AAMODT A, 1994, AI COMMUN, V7, P39
[2] SubUNets: End-to-end Hand Shape and Continuous Sign Language Recognition
Camgoz, Necati Cihan
Hadfield, Simon
Koller, Oscar
Bowden, Richard
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3075 - 3084
[3] Camgöz NC, 2020, PROC CVPR IEEE, P10020, DOI 10.1109/CVPR42600.2020.01004
[4] Neural Sign Language Translation
Camgoz, Necati Cihan
Hadfield, Simon
Koller, Oscar
Ney, Hermann
Bowden, Richard
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7784 - 7793
[5] A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training
Cui, Runpeng
Liu, Hu
Zhang, Changshui
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (07) : 1880 - 1891
[6] Duda R. O., 1973, Pattern clas-sification and scene analysis, V3
[7] Freeman W.T., 1995, INT WORKSH AUT FAC G, V12, P296
[8] Gao W, 2004, PATTERN RECOGN, V37, P2389, DOI 10.1016/j.patcog.2004.04.008
[9] Graves A., 2006, INT C MACHINE LEARNI, P369, DOI DOI 10.1145/1143844.1143891
[10] Hadsell R., 2006, 2006 IEEE COMP SOC C, V2, P1735

← 1 2 3 4 →