Detecting human-object interaction with multi-level pairwise feature network

被引:16
作者
Liu, Hanchao [1 ]
Mu, Tai-Jiang [1 ]
Huang, Xiaolei [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Key Lab Pervas Comp,Minist Educ, Beijing 100084, Peoples R China
[2] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
基金
中国国家自然科学基金;
关键词
human-object interaction detection; pairwise feature network; deep learning; multi-level; object instance;
D O I
10.1007/s41095-020-0188-2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Human-object interaction (HOI) detection is crucial for human-centric image understanding which aims to infer human, action, object triplets within an image. Recent studies often exploit visual features and the spatial configuration of a human-object pair in order to learn the action linking the human and object in the pair. We argue that such a paradigm of pairwise feature extraction and action inference can be applied not only at the whole human and object instance level, but also at the part level at which a body part interacts with an object, and at the semantic level by considering the semantic label of an object along with human appearance and human-object spatial configuration, to infer the action. We thus propose a multi-level pairwise feature network (PFNet) for detecting human-object interactions. The network consists of three parallel streams to characterize HOI utilizing pairwise features at the above three levels; the three streams are finally fused to give the action prediction. Extensive experiments show that our proposed PFNet outperforms other state-of-the-art methods on the V-COCO dataset and achieves comparable results to the state-of-the-art on the HICO-DET dataset.
引用
收藏
页码:229 / 239
页数:11
相关论文
共 39 条
[1]  
Abdulmunem A., 2016, Computational Visual Media, V2, P97, DOI DOI 10.1007/S41095-016-0033-9
[2]  
[Anonymous], 2017, ArXiv:1711.01467
[3]  
[Anonymous], 2017, ARXIV170205448
[4]  
[Anonymous], 2016, ARXIV161200137
[5]  
[Anonymous], 2017, ARXIV170407333
[6]  
Bansal A, 2020, AAAI CONF ARTIF INTE, V34, P10460
[7]   Learning to Detect Human-Object Interactions with Knowledge [J].
Xu, Bingjie ;
Wong, Yongkang ;
Li, Junnan ;
Zhao, Qi ;
Kankanhalli, Mohan S. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2019-2028
[8]   Salient object detection: A survey [J].
Borji, Ali ;
Cheng, Ming-Ming ;
Hou, Qibin ;
Jiang, Huaizu ;
Li, Jia .
COMPUTATIONAL VISUAL MEDIA, 2019, 5 (02) :117-150
[9]   Pairwise Body-Part Attention for Recognizing Human-Object Interactions [J].
Fang, Hao-Shu ;
Cao, Jinkun ;
Tai, Yu-Wing ;
Lu, Cewu .
COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 :52-68
[10]  
Gao Chen, 2018, ARXIV180810437