Detecting human-object interaction with multi-level pairwise feature network

被引：16

作者：

Liu, Hanchao ^{[1
]}

Mu, Tai-Jiang ^{[1
]}

Huang, Xiaolei ^{[2
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Key Lab Pervas Comp,Minist Educ, Beijing 100084, Peoples R China

[2] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA

来源：

COMPUTATIONAL VISUAL MEDIA | 2021年 / 7卷 / 02期

基金：

中国国家自然科学基金;

关键词：

human-object interaction detection; pairwise feature network; deep learning; multi-level; object instance;

D O I：

10.1007/s41095-020-0188-2

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Human-object interaction (HOI) detection is crucial for human-centric image understanding which aims to infer human, action, object triplets within an image. Recent studies often exploit visual features and the spatial configuration of a human-object pair in order to learn the action linking the human and object in the pair. We argue that such a paradigm of pairwise feature extraction and action inference can be applied not only at the whole human and object instance level, but also at the part level at which a body part interacts with an object, and at the semantic level by considering the semantic label of an object along with human appearance and human-object spatial configuration, to infer the action. We thus propose a multi-level pairwise feature network (PFNet) for detecting human-object interactions. The network consists of three parallel streams to characterize HOI utilizing pairwise features at the above three levels; the three streams are finally fused to give the action prediction. Extensive experiments show that our proposed PFNet outperforms other state-of-the-art methods on the V-COCO dataset and achieves comparable results to the state-of-the-art on the HICO-DET dataset.

引用

页码：229 / 239

页数：11

共 39 条

[1]

Abdulmunem A., 2016, Computational Visual Media, V2, P97, DOI DOI 10.1007/S41095-016-0033-9

[2]

[Anonymous], 2017, ArXiv:1711.01467

[3]

[Anonymous], 2017, ARXIV170205448

[4]

[Anonymous], 2016, ARXIV161200137

[5]

[Anonymous], 2017, ARXIV170407333

[6]

Bansal A, 2020, AAAI CONF ARTIF INTE, V34, P10460

[7] Learning to Detect Human-Object Interactions with Knowledge [J].

Xu, Bingjie ;

Wong, Yongkang ;

Li, Junnan ;

Zhao, Qi ;

Kankanhalli, Mohan S. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2019-2028

[8] Salient object detection: A survey [J].

Borji, Ali ;

Cheng, Ming-Ming ;

Hou, Qibin ;

Jiang, Huaizu ;

Li, Jia .

COMPUTATIONAL VISUAL MEDIA, 2019, 5 (02) :117-150

[9] Pairwise Body-Part Attention for Recognizing Human-Object Interactions [J].

Fang, Hao-Shu ;

Cao, Jinkun ;

Tai, Yu-Wing ;

Lu, Cewu .

COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 :52-68

[10]

Gao Chen, 2018, ARXIV180810437

← 1 2 3 4 →