Efficient and Precise Interactive Hand Tracking Through Joint, Continuous Optimization of Pose and Correspondences

被引:201
作者
Taylor, Jonathan [1 ]
Bordeaux, Lucas [1 ]
Cashman, Thomas [1 ]
Corish, Bob [1 ]
Keskin, Cem [1 ]
Sharp, Toby [1 ]
Soto, Eduardo [1 ,2 ]
Sweeney, David [1 ]
Valentin, Julien [1 ,3 ]
Luff, Benjamin [1 ,4 ]
Topalian, Arran [1 ,4 ]
Wood, Erroll [1 ,5 ]
Khamis, Sameh [1 ]
Kohli, Pushmeet [1 ]
Izadi, Shahram [1 ]
Banks, Richard [1 ]
Fitzgibbon, Andrew [1 ]
Shotton, Jamie [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] McMaster Univ, Hamilton, ON L8S 4L8, Canada
[3] Univ Oxford, Oxford OX1 2JD, England
[4] Univ Abertay, Dundee, Scotland
[5] Univ Cambridge, Cambridge CB2 1TN, England
来源
ACM TRANSACTIONS ON GRAPHICS | 2016年 / 35卷 / 04期
关键词
articulated tracking; virtual reality; subdivision surfaces;
D O I
10.1145/2897824.2925965
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.
引用
收藏
页数:12
相关论文
共 61 条
[41]  
Sun X, 2015, PROC CVPR IEEE, P824, DOI 10.1109/CVPR.2015.7298683
[42]   Robust Articulated-ICP for Real-Time Hand Tracking [J].
Tagliasacchi, Andrea ;
Schroeder, Matthias ;
Tkach, Anastasia ;
Bouaziz, Sofien ;
Botsch, Mario ;
Pauly, Mark .
COMPUTER GRAPHICS FORUM, 2015, 34 (05) :101-114
[43]  
Tan D. J., 2016, P CVPR
[44]   Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose [J].
Tang, Danhang ;
Taylor, Jonathan ;
Kohli, Pushmeet ;
Keskin, Cem ;
Kim, Tae-Kyun ;
Shotton, Jamie .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :3325-3333
[45]   Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests [J].
Tang, Danhang ;
Yu, Tsz-Ho ;
Kim, Tae-Kyun .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :3224-3231
[46]   User-Specific Hand Modeling from Monocular Depth Sequences [J].
Taylor, Jonathan ;
Stebbing, Richard ;
Ramakrishna, Varun ;
Keskin, Cem ;
Shotton, Jamie ;
Izadi, Shahram ;
Hertzmann, Aaron ;
Fitzgibbon, Andrew .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :644-651
[47]  
Tejani A, 2014, LECT NOTES COMPUT SC, V8694, P462, DOI 10.1007/978-3-319-10599-4_30
[48]   Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks [J].
Tompson, Jonathan ;
Stein, Murphy ;
Lecun, Yann ;
Perlin, Ken .
ACM TRANSACTIONS ON GRAPHICS, 2014, 33 (05)
[49]  
Triggs B., 2000, VISION ALGORITHMS TH, P298, DOI DOI 10.1007/3-540-44480-7_21THISWORKWASSUPPORTEDINPARTBYTHEEUROPEAN
[50]  
TZIONAS D., 2015, 150602178 ARXIV