Attribute And-Or Grammar for Joint Parsing of Human Pose, Parts and Attributes

被引:51
作者
Park, Seyoung [1 ]
Nie, Bruce Xiaohan [2 ]
Zhu, Song-Chun [3 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Stat & Comp Sci, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
Attribute grammar; And-Or grammar; attribute prediction; pose estimation; part localization; joint parsing; PICTORIAL STRUCTURES; FLEXIBLE MIXTURES; RECOGNITION; REPRESENTATION;
D O I
10.1109/TPAMI.2017.2731842
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an attribute and-or grammar (A-AOG) model for jointly inferring human body pose and human attributes in a parse graph with attributes augmented to nodes in the hierarchical representation. In contrast to other popular methods in the current literature that train separate classifiers for poses and individual attributes, our method explicitly represents the decomposition and articulation of body parts, and account for the correlations between poses and attributes. The A-AOG model is an amalgamation of three traditional grammar formulations: (i) Phrase structure grammar representing the hierarchical decomposition of the human body from whole to parts; (ii) Dependency grammar modeling the geometric articulation by a kinematic graph of the body pose; and (iii) Attribute grammar accounting for the compatibility relations between different parts in the hierarchy so that their appearances follow a consistent style. The parse graph outputs human detection, pose estimation, and attribute prediction simultaneously, which are intuitive and interpretable. We conduct experiments on two tasks on two datasets, and experimental results demonstrate the advantage of joint modeling in comparison with computing poses and attributes independently. Furthermore, our model obtains better performance over existing methods for both pose estimation and attribute prediction tasks.
引用
收藏
页码:1555 / 1569
页数:15
相关论文
共 49 条
[1]  
Abney SP, 1997, COMPUT LINGUIST, V23, P597
[2]  
Andriluka M, 2009, PROC CVPR IEEE, P1014, DOI 10.1109/CVPRW.2009.5206754
[3]  
[Anonymous], PROC CVPR IEEE
[4]  
[Anonymous], 1990, P ADV NEUR INF PROC
[5]  
[Anonymous], IEEE T PATTERN ANAL
[6]  
[Anonymous], P INT C ATTR GRAMM T
[7]  
[Anonymous], 2014, Advances in Neural Information Processing Systems
[8]  
Bourdev L, 2011, IEEE I CONF COMP VIS, P1543, DOI 10.1109/ICCV.2011.6126413
[9]   Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations [J].
Bourdev, Lubomir ;
Malik, Jitendra .
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :1365-1372
[10]   Describing Clothing by Semantic Attributes [J].
Chen, Huizhong ;
Gallagher, Andrew ;
Girod, Bernd .
COMPUTER VISION - ECCV 2012, PT III, 2012, 7574 :609-623