Hand pose estimation with multi-scale network

被引:15
作者
Hu, Zhongxu [1 ]
Hu, Youmin [1 ]
Wu, Bo [1 ]
Liu, Jie [1 ]
Han, Dongmin [2 ]
Kurfess, Thomas [2 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Mech Sci & Engn, Wuhan, Hubei, Peoples R China
[2] Georgia Inst Technol, George W Woodruff Sch Mech Engn, Atlanta, GA 30332 USA
基金
国家重点研发计划;
关键词
Hand pose estimation; Convolutional neural network; Multi-scale; End-to-end; Stair Rectified Linear Units; GESTURE;
D O I
10.1007/s10489-017-1092-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hand pose estimation plays an important role in human-computer interaction. Because it is a problem of high-dimensional nonlinear regression, the accuracy achieved by the existing methods of hand pose estimation are still unsatisfactory. With the development of deep neural networks, more and more people have begun to adopt the method involving deep neural network.We proposed a multi-scale convolutional neural network for the single depth image of the hand. The network, which is end-to-end, directly calculates the three-dimensional coordinates of the joints of the hand,and the multi-scale structure enhances the convergence speed and the output accuracy of the network. In addition, an output function for the output layer, called Stair Rectified Linear Units, is used to limit the output value. As a result of experiments, the optimization method with momentum is found not suitable for hand pose estimation because it is a task of unstable regression. Finally our proposed method has state-of-the-art performance on the NYU Hand Pose Dataset.
引用
收藏
页码:2501 / 2515
页数:15
相关论文
共 39 条
[1]  
[Anonymous], COMPUTER VISION IMAG
[2]  
[Anonymous], 2014, CVPR, DOI DOI 10.1109/CVPR.2014.145
[3]  
[Anonymous], 2011, BMVC
[4]  
[Anonymous], ARTIFICIAL INTELLIGE
[5]   Self-organizing maps for hand and full body tracking [J].
Coleca, Foti ;
State, Andreea ;
Klement, Sascha ;
Barth, Erhardt ;
Martinetz, Thomas .
NEUROCOMPUTING, 2015, 147 :174-184
[6]   Real-time 3D human pose recovery from a single depth image using principal direction analysis [J].
Dong-Luong Dinh ;
Lim, Myeong-Jun ;
Nguyen Duc Thang ;
Lee, Sungyoung ;
Kim, Tae-Seong .
APPLIED INTELLIGENCE, 2014, 41 (02) :473-486
[7]   Multi-task, multi-domain learning: Application to semantic segmentation and pose regression [J].
Fourure, Damien ;
Emonet, Remi ;
Fromont, Elisa ;
Muselet, Damien ;
Neverova, Natalia ;
Tremeau, Alain ;
Wolf, Christian .
NEUROCOMPUTING, 2017, 251 :68-80
[8]   Robust 3D Hand Pose Estimation in Single Depth Images: from Single-View CNN to Multi-View CNNs [J].
Ge, Liuhao ;
Liang, Hui ;
Yuan, Junsong ;
Thalmann, Daniel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3593-3601
[9]   Classification of Skeletal Wireframe Representation of Hand Gesture Using Complex-Valued Neural Network [J].
Hafiz, Abdul Rahman ;
Al-Nuaimi, Ahmed Yarub ;
Amin, Md. Faijul ;
Murase, Kazuyuki .
NEURAL PROCESSING LETTERS, 2015, 42 (03) :649-664
[10]   Large-scale Video Classification with Convolutional Neural Networks [J].
Karpathy, Andrej ;
Toderici, George ;
Shetty, Sanketh ;
Leung, Thomas ;
Sukthankar, Rahul ;
Fei-Fei, Li .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1725-1732