Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

被引:27
作者
Liu, Jian [1 ]
Rahmani, Hossein [2 ]
Akhtar, Naveed [1 ]
Mian, Ajmal [1 ]
机构
[1] Univ Western Australia, Sch Comp Sci & Software Engn, 35 Stirling Highway, Crawley, WA 6009, Australia
[2] Univ Lancaster, Sch Comp & Commun, Lancaster, Lancs, England
基金
澳大利亚研究理事会;
关键词
Human action recognition; Cross-view; Cross-subject; Depth sensor; CNN; GAN; HISTOGRAMS; VIEWS;
D O I
10.1007/s11263-019-01192-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose Human Pose Models that represent RGB and depth images of human poses independent of clothing textures, backgrounds, lighting conditions, body shapes and camera viewpoints. Learning such universal models requires training images where all factors are varied for every human pose. Capturing such data is prohibitively expensive. Therefore, we develop a framework for synthesizing the training data. First, we learn representative human poses from a large corpus of real motion captured human skeleton data. Next, we fit synthetic 3D humans with different body shapes to each pose and render each from 180 camera viewpoints while randomly varying the clothing textures, background and lighting. Generative Adversarial Networks are employed to minimize the gap between synthetic and real image distributions. CNN models are then learned that transfer human poses to a shared high-level invariant space. The learned CNN models are then used as invariant feature extractors from real RGB and depth frames of human action videos and the temporal variations are modelled by Fourier Temporal Pyramid. Finally, linear SVM is used for classification. Experiments on three benchmark human action datasets show that our algorithm outperforms existing methods by significant margins for RGB only and RGB-D action recognition.
引用
收藏
页码:1545 / 1564
页数:20
相关论文
共 73 条
[1]  
[Anonymous], 2007, SINGLE VIEW HUMAN AC
[2]  
[Anonymous], 2017, IEEE C COMP VIS PATT
[3]  
[Anonymous], IEEE C COMP VIS PATT
[4]  
[Anonymous], 2012, CoRR
[5]  
[Anonymous], 2016, P ECCV
[6]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[7]   Human detection using oriented histograms of flow and appearance [J].
Dalal, Navneet ;
Triggs, Bill ;
Schmid, Cordelia .
COMPUTER VISION - ECCV 2006, PT 2, PROCEEDINGS, 2006, 3952 :428-441
[8]  
Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[9]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[10]  
Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714