Spatio-Temporal Matching for Human Pose Estimation in Video

被引：18

作者：

Zhou, Feng ^{[1
]}

De la Torre, Fernando ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2016年 / 38卷 / 08期

基金：

美国国家科学基金会;

关键词：

Human pose estimation; dense trajectories; spatio-temporal bilinear model; trajectory matching; BODY MOTION CAPTURE; 3D HUMAN POSE; MULTIPLE; TRACKING;

D O I：

10.1109/TPAMI.2016.2526002

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Detection and tracking humans in videos have been long-standing problems in computer vision. Most successful approaches (e.g., deformable parts models) heavily rely on discriminative models to build appearance detectors for body joints and generative models to constrain possible body configurations (e.g., trees). While these 2D models have been successfully applied to images (and with less success to videos), a major challenge is to generalize these models to cope with camera views. In order to achieve view-invariance, these 2D models typically require a large amount of training data across views that is difficult to gather and time-consuming to label. Unlike existing 2D models, this paper formulates the problem of human detection in videos as spatio-temporal matching (STM) between a 3D motion capture model and trajectories in videos. Our algorithm estimates the camera view and selects a subset of tracked trajectories that matches the motion of the 3D model. The STM is efficiently solved with linear programming, and it is robust to tracking mismatches, occlusions and outliers. To the best of our knowledge this is the first paper that solves the correspondence between video and 3D motion capture data for human pose detection. Experiments on the CMU motion capture, Human3.6M, Berkeley MHAD and CMU MAD databases illustrate the benefits of our method over state-of-the-art approaches.

引用

页码：1492 / 1504

页数：13

共 51 条

[1] Recovering 3D human pose from monocular images [J].

Agarwal, A ;

Triggs, B .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (01) :44-58

[2] Bilinear Spatiotemporal Basis Models [J].

Akhter, Ijaz ;

Simon, Tomas ;

Khan, Sohaib ;

Matthews, Iain ;

Sheikh, Yaser .

ACM TRANSACTIONS ON GRAPHICS, 2012, 31 (02) :1-12

[3] Trajectory Space: A Dual Representation for Nonrigid Structure from Motion [J].

Akhter, Ijaz ;

Sheikh, Yaser ;

Khan, Sohaib ;

Kanade, Takeo .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (07) :1442-1456

[4]

Andriluka M., 2008, PROC IEEE C COMPUT V, P1

[5] Discriminative Appearance Models for Pictorial Structures [J].

Andriluka, Mykhaylo ;

Roth, Stefan ;

Schiele, Bernt .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2012, 99 (03) :259-280

[6] Monocular 3D Pose Estimation and Tracking by Detection [J].

Andriluka, Mykhaylo ;

Roth, Stefan ;

Schiele, Bernt .

2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :623-630

[7]

[Anonymous], FUTURE GENERATION CO

[8]

[Anonymous], CARNEGIE MELLON U MO

[9]

[Anonymous], P IEEE INT C COMP VI

[10]

[Anonymous], P ADV NEUR INF PROC

← 1 2 3 4 5 6 →