Human gesture recognition of dynamic skeleton using graph convolutional networks

被引:1
作者
Liang, Wuyan [1 ]
Xu, Xiaolong [2 ]
Xiao, Fu [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
intelligent vision computing; graph convolutional networks; spatiotemporal correlations; dynamic gesture recognition; SIGN-LANGUAGE RECOGNITION;
D O I
10.1117/1.JEI.32.2.021402
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this era, intelligent vision computing has always been a fascinating field. With the rapid development in computer vision, dynamic gesture-based recognition systems have attracted significant attention. However, automatically recognizing skeleton-based human gestures in the form of sign language is complex and challenging. Most existing methods consider skeleton-based human gesture recognition as a standard video recognition problem, without considering the rich structure information among both joints and gesture frames. Graph convolutional networks (GCNs) are a promising way to leverage structure information to learn structure representations. However, adopting GCNs to tackle such gesture sequences both in spatial and temporal spaces is challenging as graph could be highly nonlinear and complex. To overcome this issue, we propose the spatiotemporal GCNs model to leverage the powerful spatiotemporal correlations to adaptively construct spatiotemporal graphs, called Aegles. Our method could dynamically attend to relatively significant spatiotemporal joints and construct different graphs, including spatial, temporal, and spatiotemporal graph, and well capturing the structure information in gesture sequences. Besides, we introduce the second-order information of the gesture skeleton data, i.e., the length and orientation of bones, to improve the representation of human hands and fingers. In addition, with the public sign language datasets, we use OpenPose technology to extract human gesture skeleton and obtain human skeleton video, building four skeleton-based sign language recognition datasets. Experimental results show that this Aegles outperforms the state-of-the-art ones and that the spatiotemporal correlations effectively boost the performance of human gesture recognition.
引用
收藏
页数:21
相关论文
共 39 条
[1]  
[Anonymous], 2012, 5 WORKSHOP REPRESENT
[2]  
Athitsos V, 2008, PROC CVPR IEEE, P1666
[3]   Recognition of human actions using motion history information extracted from the compressed video [J].
Babu, RV ;
Ramakrishnan, KR .
IMAGE AND VISION COMPUTING, 2004, 22 (08) :597-607
[4]   Effects of New Supportive Technologies for Blind and Deaf Engineering Students in Online Learning [J].
Batanero, Concha ;
de-Marcos, Luis ;
Holvikivi, Jaana ;
Ramon Hilera, Jose ;
Oton, Salvador .
IEEE TRANSACTIONS ON EDUCATION, 2019, 62 (04) :270-277
[5]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[6]  
Chai X., 2014, VIPLTR14SLR001 KEY L
[7]  
Chintala S., 2017, An overview of deep learning frameworks and an introduction to pytorch
[8]   Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition [J].
de Amorim, Cleison Correia ;
Macedo, David ;
Zanchettin, Cleber .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 :646-657
[9]  
Dreuw P., 2008, Technology and Disability, V20, P121
[10]  
Huang J., 2015, IEEE INT C MULT EXP, P1, DOI [10.1109/ICME.2015.7177428, DOI 10.1109/ICME.2015.7177428]