Sign Transition Modeling and a Scalable Solution to Continuous Sign Language Recognition for Real-World Applications

被引:44
作者
Li, Kehuang [1 ]
Zhou, Zhengyu [2 ,3 ]
Lee, Chin-Hui [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, 777 Atlantic Dr NW, Atlanta, GA 30332 USA
[2] Robert Bosch LLC, Res & Technol Ctr, Stuttgart, Germany
[3] Bosch Res & Technol Ctr North Amer, 4005 Miranda Ave,200, Palo Alto, CA 94304 USA
关键词
Sign language recognition; transition modeling; speech recognition; hidden Markov models;
D O I
10.1145/2850421
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We propose a new approach to modeling transition information between signs in continuous Sign Language Recognition (SLR) and address some scalability issues in designing SLR systems. In contrast to Automatic Speech Recognition (ASR) in which the transition between speech sounds is often brief and mainly addressed by the coarticulation effect, the sign transition in continuous SLR is far from being clear and usually not easily and exactly characterized. Leveraging upon hidden Markov modeling techniques from ASR, we proposed a modeling framework for continuous SLR having the following major advantages, namely: (i) the system is easy to scale up to large-vocabulary SLR; (ii) modeling of signs as well as the transitions between signs is robust even for noisy data collected in real-world SLR; and (iii) extensions to training, decoding, and adaptation are directly applicable even with new deep learning algorithms. A pair of low-cost digital gloves affordable for the deaf and hard of hearing community is used to collect a collection of training and testing data for real-world SLR interaction applications. Evaluated on 1,024 testing sentences from five signers, a word accuracy rate of 87.4% is achieved using a vocabulary of 510 words. The SLR speed is in real time, requiring an average of 0.69s per sentence. The encouraging results indicate that it is feasible to develop real-world SLR applications based on the proposed SLR framework.
引用
收藏
页数:23
相关论文
共 64 条
[1]  
ALTMAN SL, 1986, ROTATIONS QUATERNION
[2]  
[Anonymous], 2011, P ASRU
[3]   CAN PROGRAMMING BE LIBERATED FROM VON NEUMANN STYLE - FUNCTIONAL STYLE AND ITS ALGEBRA OF PROGRAMS [J].
BACKUS, J .
COMMUNICATIONS OF THE ACM, 1978, 21 (08) :613-641
[4]   A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].
BAHL, LR ;
JELINEK, F ;
MERCER, RL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190
[5]   DRAGON SYSTEM - OVERVIEW [J].
BAKER, JK .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :24-29
[6]  
Bazzi I., 2002, THESIS
[7]  
BENSHNEIDERMAN, 1986, DESIGNING USER INTER
[8]  
Chai X, 2013, P 15 INT ACM SIGACCE, P76
[9]  
Chen YQ, 2003, IEEE INTERNATIONAL WORKSHOP ON ANALYSIS AND MODELING OF FACE AND GESTURES, P236
[10]  
Cherry Collin, 1968, HUMAN COMMUNICATIONS