Continuous Gesture Segmentation and Recognition Using 3DCNN and Convolutional LSTM

被引:90
作者
Zhu, Guangming [1 ]
Zhang, Liang [1 ]
Shen, Peiyi [1 ]
Song, Juan [1 ]
Shah, Syed Afaq Ali [2 ,3 ]
Bennamoun, Mohammed [2 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Shaanxi, Peoples R China
[2] Univ Western Australia, Perth, WA 6000, Australia
[3] Cent Queensland Univ, Rockhampton, Qld 4701, Australia
基金
中国国家自然科学基金;
关键词
Continuous gesture recognition; 3DCNN; convolutional LSTM; dilation; FUSION;
D O I
10.1109/TMM.2018.2869278
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Continuous gesture recognition aims at recognizing the ongoing gestures from continuous gesture sequences and is more meaningful for the scenarios, where the start and end frames of each gesture instance are generally unknown in practical applications. This paper presents an effective deep architecture for continuous gesture recognition. First, continuous gesture sequences are segmented into isolated gesture instances using the proposed temporal dilated Res3D network. A balanced squared hinge loss function is proposed to deal with the imbalance between boundaries and nonboundaries. Temporal dilation can preserve the temporal information for the dense detection of the boundaries at fine granularity, and the large temporal receptive field makes the segmentation resultsmore reasonable and effective. Then, the recognition network is constructed based on the 3-D convolutional neural network (3DCNN), the convolutional long-short-term-memory network (ConvLSTM), and the 2-D convolutional neural network (2DCNN) for isolated gesture recognition. The "3DCNN-ConvLSTM-2DCNN" architecture is more effective to learn long-term and deep spatiotemporal features. The proposed segmentation and recognition networks obtain the Jaccard index of 0.7163 on the Chalearn LAP ConGD dataset, which is 0.106 higher than the winner of 2017 ChaLearn LAP Large-Scale Continuous Gesture Recognition Challenge.
引用
收藏
页码:1011 / 1021
页数:11
相关论文
共 57 条
[1]  
[Anonymous], 2016, arXiv
[2]  
[Anonymous], 2014, WORKSH EUR C COMP VI
[3]   SST: Single-Stream Temporal Action Proposals [J].
Buch, Shyamal ;
Escorcia, Victor ;
Shen, Chuanqi ;
Ghanem, Bernard ;
Niebles, Juan Carlos .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6373-6382
[4]  
Camgoz NC, 2016, INT C PATT RECOG, P49, DOI 10.1109/ICPR.2016.7899606
[5]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[6]  
Chai XJ, 2016, INT C PATT RECOG, P31, DOI 10.1109/ICPR.2016.7899603
[7]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[8]  
Duan J., 2018, ACM T MULTIM COMPUT, V14, P21, DOI DOI 10.1145/3131343
[9]  
Escalera S, 2016, J MACH LEARN RES, V17
[10]   DAPs: Deep Action Proposals for Action Understanding [J].
Escorcia, Victor ;
Heilbron, Fabian Caba ;
Niebles, Juan Carlos ;
Ghanem, Bernard .
COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 :768-784