Learning Human-Object Interactions by Graph Parsing Neural Networks

被引:415
作者
Qi, Siyuan [1 ,2 ]
Wang, Wenguan [1 ,3 ]
Jia, Baoxiong [1 ,4 ]
Shen, Jianbing [3 ,5 ]
Zhu, Song-Chun [1 ,2 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA USA
[2] Int Ctr AI & Robot Auton CARA, Los Angeles, CA USA
[3] Beijing Inst Technol, Beijing, Peoples R China
[4] Peking Univ, Beijing, Peoples R China
[5] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
来源
COMPUTER VISION - ECCV 2018, PT IX | 2018年 / 11213卷
关键词
Human-object interaction; Message passing; Graph parsing; Neural networks; AFFORDANCES;
D O I
10.1007/978-3-030-01240-3_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end. For a given scene, GPNN infers a parse graph that includes (i) the HOI graph structure represented by an adjacency matrix, and (ii) the node labels. Within a message passing inference framework, GPNN iteratively computes the adjacency matrices and node labels. We extensively evaluate our model on three HOI detection benchmarks on images and videos: HICO-DET, V-COCO, and CAD-120 datasets. Our approach significantly outperforms state-of-art methods, verifying that GPNN is scalable to large datasets and applies to spatial-temporal settings.
引用
收藏
页码:407 / 423
页数:17
相关论文
共 50 条
[1]  
[Anonymous], 2011, NIPS
[2]  
[Anonymous], 2018, AAAI C ART INT AAAI
[3]  
[Anonymous], 2017, PAMI
[4]  
[Anonymous], 2014, NIPS
[5]  
[Anonymous], 2015, ICCV
[6]  
[Anonymous], 2016, ICLR
[7]  
[Anonymous], 2015, ICML
[8]  
Chao Y.W., 2018, LEARNING DETECT HUMA
[9]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[10]  
Cho K., 2014, ARXIV, DOI 10.3115/v1/w14-4012