BSL-1K: Scaling Up Co-articulated Sign Language Recognition Using Mouthing Cues

被引:91
作者
Albanie, Samuel [1 ]
Varol, Gul [1 ]
Momeni, Liliane [1 ]
Afouras, Triantafyllos [1 ]
Chung, Joon Son [1 ,2 ]
Fox, Neil [3 ]
Zisserman, Andrew [1 ]
机构
[1] Univ Oxford, Visual Geometry Grp, Oxford, England
[2] Naver Corp, Seoul, South Korea
[3] UCL, Deafness Cognit & Language Res Ctr, London, England
来源
COMPUTER VISION - ECCV 2020, PT XI | 2020年 / 12356卷
基金
英国工程与自然科学研究理事会;
关键词
Sign language recognition; Visual keyword spotting;
D O I
10.1007/978-3-030-58621-8_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality. A key stumbling block in making progress towards this goal is a lack of appropriate training data, stemming from the high complexity of sign annotation and a limited supply of qualified annotators. In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. We make use of weakly-aligned subtitles for broadcast footage together with a keyword spotting method to automatically localise sign-instances for a vocabulary of 1,000 signs in 1,000 h of video. We make the following contributions: (1) We show how to use mouthing cues from signers to obtain high-quality annotations from video data-the result is the BSL-1K dataset, a collection of British Sign Language (BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks-we exceed the state of the art on both the MSASL and WLASL benchmarks. Finally, (3) we propose new large-scale evaluation sets for the tasks of sign recognition and sign spotting and provide baselines which we hope will serve to stimulate research in this area.
引用
收藏
页码:35 / 53
页数:19
相关论文
共 63 条
[1]  
Afouras T., 2019, IEEE Trans. Pattern Anal. Mach. Intell.
[2]  
[Anonymous], 2011, Visual Analysis of Humans, DOI [DOI 10.1007/978-0-85729-997-0_27, DOI 10.1007/978-0-85729-997-027]
[3]  
[Anonymous], 2007, Simultaneity in Signed Languages: Form and Function
[4]  
Antonakos E, 2015, IEEE INT CONF AUTOMA
[5]  
Athitsos V, 2008, PROC CVPR IEEE, P1666
[6]   Variation in mouth actions with manual signs in Sign Language of the Netherlands (NGT) [J].
Bank, Richard ;
Crasborn, Onno ;
van Hout, Roeland .
SIGN LANGUAGE & LINGUISTICS, 2011, 14 (02) :248-270
[7]  
Bilge YC, 2019, BMVC
[8]  
Buehler P, 2009, PROC CVPR IEEE, P2953, DOI 10.1109/CVPRW.2009.5206523
[9]   SubUNets: End-to-end Hand Shape and Continuous Sign Language Recognition [J].
Camgoz, Necati Cihan ;
Hadfield, Simon ;
Koller, Oscar ;
Bowden, Richard .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3075-3084
[10]   Neural Sign Language Translation [J].
Camgoz, Necati Cihan ;
Hadfield, Simon ;
Koller, Oscar ;
Ney, Hermann ;
Bowden, Richard .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7784-7793