BSL-1K: Scaling Up Co-articulated Sign Language Recognition Using Mouthing Cues

被引：91

作者：

Albanie, Samuel ^{[1
]}

Varol, Gul ^{[1
]}

Momeni, Liliane ^{[1
]}

Afouras, Triantafyllos ^{[1
]}

Chung, Joon Son ^{[1
,2
]}

Fox, Neil ^{[3
]}

Zisserman, Andrew ^{[1
]}

机构：

[1] Univ Oxford, Visual Geometry Grp, Oxford, England

[2] Naver Corp, Seoul, South Korea

[3] UCL, Deafness Cognit & Language Res Ctr, London, England

来源：

COMPUTER VISION - ECCV 2020, PT XI | 2020年 / 12356卷

基金：

英国工程与自然科学研究理事会;

关键词：

Sign language recognition; Visual keyword spotting;

D O I：

10.1007/978-3-030-58621-8_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality. A key stumbling block in making progress towards this goal is a lack of appropriate training data, stemming from the high complexity of sign annotation and a limited supply of qualified annotators. In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. We make use of weakly-aligned subtitles for broadcast footage together with a keyword spotting method to automatically localise sign-instances for a vocabulary of 1,000 signs in 1,000 h of video. We make the following contributions: (1) We show how to use mouthing cues from signers to obtain high-quality annotations from video data-the result is the BSL-1K dataset, a collection of British Sign Language (BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks-we exceed the state of the art on both the MSASL and WLASL benchmarks. Finally, (3) we propose new large-scale evaluation sets for the tasks of sign recognition and sign spotting and provide baselines which we hope will serve to stimulate research in this area.

引用

页码：35 / 53

页数：19

共 63 条

[1]

Afouras T., 2019, IEEE Trans. Pattern Anal. Mach. Intell.

[2]

[Anonymous], 2011, Visual Analysis of Humans, DOI [DOI 10.1007/978-0-85729-997-0_27, DOI 10.1007/978-0-85729-997-027]

[3]

[Anonymous], 2007, Simultaneity in Signed Languages: Form and Function

[4]

Antonakos E, 2015, IEEE INT CONF AUTOMA

[5]

Athitsos V, 2008, PROC CVPR IEEE, P1666

[6] Variation in mouth actions with manual signs in Sign Language of the Netherlands (NGT) [J].

Bank, Richard ;

Crasborn, Onno ;

van Hout, Roeland .

SIGN LANGUAGE & LINGUISTICS, 2011, 14 (02) :248-270

[7]

Bilge YC, 2019, BMVC

[8]

Buehler P, 2009, PROC CVPR IEEE, P2953, DOI 10.1109/CVPRW.2009.5206523

[9] SubUNets: End-to-end Hand Shape and Continuous Sign Language Recognition [J].

Camgoz, Necati Cihan ;

Hadfield, Simon ;

Koller, Oscar ;

Bowden, Richard .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3075-3084

[10] Neural Sign Language Translation [J].

Camgoz, Necati Cihan ;

Hadfield, Simon ;

Koller, Oscar ;

Ney, Hermann ;

Bowden, Richard .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7784-7793

← 1 2 3 4 5 6 7 →