Classification of Multi-class Daily Human Motion using Discriminative Body Parts and Sentence Descriptions

被引:8
作者
Goutsu, Yusuke [1 ]
Takano, Wataru [2 ]
Nakamura, Yoshihiko [3 ]
机构
[1] AIST, Comp Vis Res Grp, Cent 1,1-1-1 Umezono, Tsukuba, Ibaraki, Japan
[2] Osaka Univ, Ctr Math Modeling & Data Sci, 1-3 Machikaneyamacho, Toyonaka, Osaka, Japan
[3] Univ Tokyo, Mechanoinformat, Bunkyo Ku, 7-3-1 Hongo, Tokyo, Japan
基金
日本学术振兴会;
关键词
Hidden Markov model; Fisher vector; Multiple kernel learning; Motion classification; Multi-class; Sentence description; PARTIAL LEAST-SQUARES; ACTION RECOGNITION; POSE; IMITATION; LATENCY;
D O I
10.1007/s11263-017-1053-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a motion model that focuses on the discriminative parts of the human body related to target motions to classify human motions into specific categories, and apply this model to multi-class daily motion classifications. We extend this model to a motion recognition system which generates multiple sentences associated with human motions. The motion model is evaluated with the following four datasets acquired by a Kinect sensor or multiple infrared cameras in a motion capture studio: UCF-kinect; UT-kinect; HDM05-mocap; and YNL-mocap. We also evaluate the sentences generated from the dataset of motion and language pairs. The experimental results indicate that the motion model improves classification accuracy and our approach is better than other state-of-the-art methods for specific datasets, including human-object interactions with variations in the duration of motions, such as daily human motions. We achieve a classification rate of 81.1% for multi-class daily motion classifications in a non cross-subject setting. Additionally, the sentences generated by the motion recognition system are semantically and syntactically appropriate for the description of the target motion, which may lead to human-robot interaction using natural language.
引用
收藏
页码:495 / 514
页数:20
相关论文
共 45 条
[31]   Kernel partial least squares regression in Reproducing Kernel Hilbert Space [J].
Rosipal, R ;
Trejo, LJ .
JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (02) :97-123
[32]   Real-Time Human Pose Recognition in Parts from Single Depth Images [J].
Shotton, Jamie ;
Sharp, Toby ;
Kipman, Alex ;
Fitzgibbon, Andrew ;
Finocchio, Mark ;
Blake, Andrew ;
Cook, Mat ;
Moore, Richard .
COMMUNICATIONS OF THE ACM, 2013, 56 (01) :116-124
[33]   Accurate 3D action recognition using learning on the Grassmann manifold [J].
Slama, Rim ;
Wannous, Hazem ;
Daoudi, Mohamed ;
Srivastava, Anuj .
PATTERN RECOGNITION, 2015, 48 (02) :556-567
[34]  
Sonnenburg S, 2006, J MACH LEARN RES, V7, P1531
[35]   Learning semantic combinatoriality from the interaction between linguistic and behavioral processes [J].
Sugita, Y ;
Tani, J .
ADAPTIVE BEHAVIOR, 2005, 13 (01) :33-52
[36]   ACTIVE: Activity Concept Transitions in Video Event Classification [J].
Sun, Chen ;
Nevatia, Ram .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :913-920
[37]   Statistical mutual conversion between whole body motion primitives and linguistic sentences for human motions [J].
Takano, Wataru ;
Nakamura, Yoshihiko .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2015, 34 (10) :1314-1328
[38]  
Vieira A.W., 2012, PROGR PATTERN RECOGN, P252, DOI [DOI 10.1007/978-3-642-33275-331, DOI 10.1007/978-3-642-33275, DOI 10.1007/978-3-642-33275-3]
[39]   An approach to pose-based action recognition [J].
Wang, Chunyu ;
Wang, Yizhou ;
Yuille, Alan L. .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :915-922
[40]  
Wang J, 2012, LECT NOTES COMPUT SC, V7573, P872, DOI 10.1007/978-3-642-33709-3_62