A large-scale combinatorial benchmark for sign language recognition

被引:0
作者
Gao, Liqing [1 ]
Wan, Liang [1 ]
Hu, Lianyu [1 ]
Han, Ruize [1 ]
Liu, Zekang [1 ]
Shi, Peng [1 ]
Shang, Fanhua [1 ]
Feng, Wei [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300350, Peoples R China
基金
中国国家自然科学基金;
关键词
Sign language recognition; T and E disassemble-and-reassemble strategy; Cost-controllable large-scale dataset; Combinatorial framework;
D O I
10.1016/j.patcog.2024.111246
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lacking a large-scale dataset is the major obstacle limiting sign language recognition (SLR) to work well in the real world, because of the huge collection and annotation cost of sign language videos. This paper rethinks a sign language sentence as a combination of a template (T) and an entity (E) and presents a novel T and E Disassemble-and-reAssemble (TEDA) strategy to collect T and E sign videos independently. The proposed TEDA strategy has a theoretical capability of generating T x E effective samples with only T + E collection and annotation cost. With the TEDA strategy, we build a cost-controllable large-scale (CCLS) sign language dataset with 300,400 combinatorial samples, generated from 6,000 T videos and 29,700 E videos. To enable training arbitrary SLR models on combinatorial data, we propose a combinatorial SLR framework. Specifically, we first design a dynamic combination module to dynamically combine independent T and E features to generate combinatorial features. Then, we propose a joint constraint module to ensure that the distribution of the combinatorial features is as close as possible with the complete features. Finally, we develop a multi-stage training strategy to accommodate SLR learning with the combinatorial data. Plentiful experiments demonstrate the rationality of our TEDA strategy in generating large-scale effective combinatorial samples as well as the effectiveness of the combinatorial framework in promoting SLR.
引用
收藏
页数:14
相关论文
共 38 条
[1]   Spatial-temporal feature-based End-to-end Fourier network for 3D sign language recognition [J].
Abdullahi, Sunusi Bala ;
Chamnongthai, Kosin ;
Bolon-Canedo, Veronica ;
Cancela, Brais .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
[2]   American Sign Language Words Recognition Using Spatio-Temporal Prosodic and Angle Features: A Sequential Learning Approach [J].
Abdullahi, Sunusi Bala ;
Chamnongthai, Kosin .
IEEE ACCESS, 2022, 10 :15911-15923
[3]   American Sign Language Words Recognition of Skeletal Videos Using Processed Video Driven Multi-Stacked Deep LSTM [J].
Abdullahi, Sunusi Bala ;
Chamnongthai, Kosin .
SENSORS, 2022, 22 (04)
[4]  
[Anonymous], 2013, IEEE C AFGR
[5]  
Ardila R, 2020, Arxiv, DOI [arXiv:1912.06670, DOI 10.48550/ARXIV.1912.06670]
[6]  
Barrault L, 2019, FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), P1
[7]   Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation [J].
Camgoz, Necati Cihan ;
Koller, Oscar ;
Hadfield, Simon ;
Bowden, Richard .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10020-10030
[8]   Neural Sign Language Translation [J].
Camgoz, Necati Cihan ;
Hadfield, Simon ;
Koller, Oscar ;
Ney, Hermann ;
Bowden, Richard .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7784-7793
[9]  
Cooper H, 2012, J MACH LEARN RES, V13, P2205
[10]   How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language [J].
Duarte, Amanda ;
Palaskar, Shruti ;
Ventura, Lucas ;
Ghadiyaram, Deepti ;
DeHaan, Kenneth ;
Metze, Florian ;
Torres, Jordi ;
Giro-i-Nieto, Xavier .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :2734-2743