Hierarchical Reasoning Network for Human-Object Interaction Detection

被引：13

作者：

Gao, Yiming ^{[1
]}

Kuang, Zhanghui ^{[2
]}

Li, Guanbin ^{[1
]}

Zhang, Wayne ^{[2
]}

Lin, Liang ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China

[2] SenseTime Res, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Cognition; Correlation; Benchmark testing; Task analysis; Sports; Periodic structures; Human-object interaction; hierarchical reasoning network; graph neural network; REPRESENTATION; CNNS;

D O I：

10.1109/TIP.2021.3093784

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human-object interaction detection that aims at detecting <human, verb, object> triplets is critical for the holistic human-centric scene understanding. Existing approaches ignore the modeling of correlations among hierarchical human parts and objects. In this work, we introduce a Hierarchical Reasoning Network (HRNet) to capture relations among human parts at multiple scales (including the holistic human, human region, and human keypoint levels) and objects via a unified graph. In particular, HRNet first constructs one multi-level human parts graph, each level of which consists of human parts at one specific scale, objects, and the unions of human part-object pairs as nodes, and their mutual visual and spatial layout relations as intra-level reasoning. To also capture the relations across scales, we further introduce inter-level reasoning between the nodes of two consecutive levels based on the prior of human body structure. The representations of graph nodes are propagated along intra-level and inter-level reasoning in turn during reasoning. Extensive experiments demonstrate our HRNet obtains new state-of-the-art results on three challenging HICO-DET, V-COCO and HOI-A benchmarks, validating the compelling effectiveness of the proposed method.

引用

页码：8306 / 8317

页数：12

共 72 条

[61]

Yang S., P IEEE CVF C COMP VI, P11266

[62] Relationship-Embedded Representation Learning for Grounding Referring Expressions [J].

Yang, Sibei ;

Li, Guanbin ;

Yu, Yizhou .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (08) :2765-2779

[63] Dynamic Graph Attention for Referring Expression Comprehension [J].

Yang, Sibei ;

Li, Guanbin ;

Yu, Yizhou .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4643-4652

[64] Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition [J].

Yin, Guojun ;

Sheng, Lu ;

Liu, Bin ;

Yu, Nenghai ;

Wang, Xiaogang ;

Shao, Jing ;

Loy, Chen Change .

COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 :330-347

[65]

Ying R., 2018, Advances in Neural Information Processing Systems, P4805

[66] Statistically-Motivated Second-Order Pooling [J].

Yu, Kaicheng ;

Salzmann, Mathieu .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :621-637

[67] Visual Translation Embedding Network for Visual Relation Detection [J].

Zhang, Hanwang ;

Kyaw, Zawlin ;

Chang, Shih-Fu ;

Chua, Tat-Seng .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3107-3115

[68]

Zhang N, 2014, LECT NOTES COMPUT SC, V8689, P834, DOI 10.1007/978-3-319-10590-1_54

[69] Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition [J].

Zheng, Heliang ;

Fu, Jianlong ;

Mei, Tao ;

Luo, Jiebo .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5219-5227

[70]

Zhi Hou, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12360), P584, DOI 10.1007/978-3-030-58555-6_35

← 1 2 3 4 5 6 7 8 →