Attention-Based 3D-CNNs for Large-Vocabulary Sign Language Recognition

被引:132
作者
Huang, Jie [1 ]
Zhou, Wengang [1 ]
Li, Houqiang [1 ]
Li, Weiping [1 ]
机构
[1] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230027, Anhui, Peoples R China
关键词
Sign language recognition; 3D convolutional neural networks; attention mechanism; deep learning; SYSTEM; MODEL;
D O I
10.1109/TCSVT.2018.2870740
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sign language recognition (SLR) is an important and challenging research topic in the multimedia field. Conventional techniques for SLR rely on hand-crafted features, which achieve limited success. In this paper, we present attention-based 3D-convolutional neural networks (3D-CNNs) for SLR. The framework has two advantages: 3D-CNNs learn spatio-temporal features from raw video without prior knowledge and the attention mechanism helps to select the clue. When training 3D-CNN for capturing spatio-temporal features, spatial attention is incorporated into the network to focus on the areas of interest. After feature extraction, temporal attention is utilized to select the significant motions for classification. The proposed method is evaluated on two large scale sign language data sets. The first one, collected by ourselves, is a Chinese sign language data set that consists of 500 categories. The other is the ChaLearn14 benchmark. The experiment results demonstrate the effectiveness of our approach compared with state-of-the-art algorithms.
引用
收藏
页码:2822 / 2832
页数:11
相关论文
共 50 条
[21]   Large-scale Video Classification with Convolutional Neural Networks [J].
Karpathy, Andrej ;
Toderici, George ;
Shetty, Sanketh ;
Leung, Thomas ;
Sukthankar, Rahul ;
Fei-Fei, Li .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1725-1732
[22]   A dynamic gesture recognition system for the Korean sign language (KSL) [J].
Kim, JS ;
Jang, W ;
Bien, ZN .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1996, 26 (02) :354-359
[23]  
Koller O., 2016, P BRIT MACH VIS C, P1
[24]   Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs [J].
Koller, Oscar ;
Zargaran, Sepehr ;
Ney, Hermann .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3416-3424
[25]   On space-time interest points [J].
Laptev, I .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2005, 64 (2-3) :107-123
[26]   Sign Transition Modeling and a Scalable Solution to Continuous Sign Language Recognition for Real-World Applications [J].
Li, Kehuang ;
Zhou, Zhengyu ;
Lee, Chin-Hui .
ACM TRANSACTIONS ON ACCESSIBLE COMPUTING, 2016, 8 (02)
[27]   Bilinear CNN Models for Fine-grained Visual Recognition [J].
Lin, Tsung-Yu ;
RoyChowdhury, Aruni ;
Maji, Subhransu .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1449-1457
[28]  
Mnih V, 2014, ADV NEUR IN, V27
[29]  
Molchanov Pavlo, 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), P1, DOI 10.1109/CVPRW.2015.7301342
[30]  
Molchanov P., 2016, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, P4207, DOI DOI 10.1109/CVPR.2016.456