Hierarchical Reasoning Network for Human-Object Interaction Detection

被引：13

作者：

Gao, Yiming ^{[1
]}

Kuang, Zhanghui ^{[2
]}

Li, Guanbin ^{[1
]}

Zhang, Wayne ^{[2
]}

Lin, Liang ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China

[2] SenseTime Res, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Cognition; Correlation; Benchmark testing; Task analysis; Sports; Periodic structures; Human-object interaction; hierarchical reasoning network; graph neural network; REPRESENTATION; CNNS;

D O I：

10.1109/TIP.2021.3093784

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human-object interaction detection that aims at detecting <human, verb, object> triplets is critical for the holistic human-centric scene understanding. Existing approaches ignore the modeling of correlations among hierarchical human parts and objects. In this work, we introduce a Hierarchical Reasoning Network (HRNet) to capture relations among human parts at multiple scales (including the holistic human, human region, and human keypoint levels) and objects via a unified graph. In particular, HRNet first constructs one multi-level human parts graph, each level of which consists of human parts at one specific scale, objects, and the unions of human part-object pairs as nodes, and their mutual visual and spatial layout relations as intra-level reasoning. To also capture the relations across scales, we further introduce inter-level reasoning between the nodes of two consecutive levels based on the prior of human body structure. The representations of graph nodes are propagated along intra-level and inter-level reasoning in turn during reasoning. Extensive experiments demonstrate our HRNet obtains new state-of-the-art results on three challenging HICO-DET, V-COCO and HOI-A benchmarks, validating the compelling effectiveness of the proposed method.

引用

页码：8306 / 8317

页数：12

共 72 条

[1] Object Level Visual Reasoning in Videos [J].

Baradel, Fabien ;

Neverova, Natalia ;

Wolf, Christian ;

Mille, Julien ;

Mori, Greg .

COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 :106-122

[2]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

[3] Cascade R-CNN: Delving into High Quality Object Detection [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162

[4] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].

Cao, Zhe ;

Simon, Tomas ;

Wei, Shih-En ;

Sheikh, Yaser .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310

[5] Learning to Detect Human-Object Interactions [J].

Chao, Yu-Wei ;

Liu, Yunfan ;

Liu, Xieyang ;

Zeng, Huayi ;

Deng, Jia .

2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :381-389

[6]

Chen Gao, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P696, DOI 10.1007/978-3-030-58610-2_41

[7] Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding [J].

Chen, Tianshui ;

Wu, Wenxi ;

Gao, Yuefang ;

Dong, Le ;

Luo, Xiaonan ;

Lin, Liang .

PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :2023-2031

[8]

Chen TS, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P627

[9] Iterative Visual Reasoning Beyond Convolutions [J].

Chen, Xinlei ;

Li, Li-Jia ;

Li Fei-Fei ;

Gupta, Abhinav .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7239-7248

[10]

Dong-Jin Kim, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12366), P718, DOI 10.1007/978-3-030-58589-1_43

← 1 2 3 4 5 6 7 8 →