Automatic and Efficient Human Pose Estimation for Sign Language Videos

被引：44

作者：

Charles, James ^{[1
]}

Pfister, Tomas ^{[2
]}

Everingham, Mark ^{[1
]}

Zisserman, Andrew ^{[2
]}

机构：

[1] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England

[2] Univ Oxford, Dept Engn Sci, Parks Rd, Oxford OX1 3PJ, England

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2014年 / 110卷 / 01期

基金：

英国工程与自然科学研究理事会;

关键词：

Sign language; Human pose estimation; Co-segmentation; Random forest; RECOGNITION;

D O I：

10.1007/s11263-013-0672-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a fully automatic arm and hand tracker that detects joint positions over continuous sign language video sequences of more than an hour in length. To achieve this, we make contributions in four areas: (i) we show that the overlaid signer can be separated from the background TV broadcast using co-segmentation over all frames with a layered model; (ii) we show that joint positions (shoulders, elbows, wrists) can be predicted per-frame using a random forest regressor given only this segmentation and a colour model; (iii) we show that the random forest can be trained from an existing semi-automatic, but computationally expensive, tracker; and, (iv) introduce an evaluator to assess whether the predicted joint positions are correct for each frame. The method is applied to 20 signing footage videos with changing background, challenging imaging conditions, and for different signers. Our framework outperforms the state-of-the-art long term tracker by Buehler et al. (International Journal of Computer Vision 95:180-197, 2011), does not require the manual annotation of that work, and, after automatic initialisation, performs tracking in real-time. We also achieve superior joint localisation results to those obtained using the pose estimation method of Yang and Ramanan (Proceedings of the IEEE conference on computer vision and pattern recognition, 2011).

引用

页码：70 / 90

页数：21

共 75 条

[1] Shape quantization and recognition with randomized trees [J].

Amit, Y ;

Geman, D .

NEURAL COMPUTATION, 1997, 9 (07) :1545-1588

[2] Discriminative Appearance Models for Pictorial Structures [J].

Andriluka, Mykhaylo ;

Roth, Stefan ;

Schiele, Bernt .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2012, 99 (03) :259-280

[3]

[Anonymous], 2006, P BRIT MACHINE VISIO

[4]

[Anonymous], 2004, P BRIT MACH VIS C

[5]

[Anonymous], 2012, EUR C COMP VIS

[6]

[Anonymous], 2010, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, DOI DOI 10.1109/CVPR.2010.5539906

[7]

[Anonymous], 2004, P EUR C COMP VIS

[8]

[Anonymous], 2006, P COMP VIS PATT REC

[9]

[Anonymous], 2004, P ACM SIGGRAPH C COM

[10]

[Anonymous], 2010, P IEEE C COMP VIS PA

← 1 2 3 4 5 6 7 8 →