Hierarchical Reasoning Network for Human-Object Interaction Detection

被引:13
作者
Gao, Yiming [1 ]
Kuang, Zhanghui [2 ]
Li, Guanbin [1 ]
Zhang, Wayne [2 ]
Lin, Liang [1 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[2] SenseTime Res, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Cognition; Correlation; Benchmark testing; Task analysis; Sports; Periodic structures; Human-object interaction; hierarchical reasoning network; graph neural network; REPRESENTATION; CNNS;
D O I
10.1109/TIP.2021.3093784
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-object interaction detection that aims at detecting <human, verb, object> triplets is critical for the holistic human-centric scene understanding. Existing approaches ignore the modeling of correlations among hierarchical human parts and objects. In this work, we introduce a Hierarchical Reasoning Network (HRNet) to capture relations among human parts at multiple scales (including the holistic human, human region, and human keypoint levels) and objects via a unified graph. In particular, HRNet first constructs one multi-level human parts graph, each level of which consists of human parts at one specific scale, objects, and the unions of human part-object pairs as nodes, and their mutual visual and spatial layout relations as intra-level reasoning. To also capture the relations across scales, we further introduce inter-level reasoning between the nodes of two consecutive levels based on the prior of human body structure. The representations of graph nodes are propagated along intra-level and inter-level reasoning in turn during reasoning. Extensive experiments demonstrate our HRNet obtains new state-of-the-art results on three challenging HICO-DET, V-COCO and HOI-A benchmarks, validating the compelling effectiveness of the proposed method.
引用
收藏
页码:8306 / 8317
页数:12
相关论文
共 72 条
[61]  
Yang S., P IEEE CVF C COMP VI, P11266
[62]   Relationship-Embedded Representation Learning for Grounding Referring Expressions [J].
Yang, Sibei ;
Li, Guanbin ;
Yu, Yizhou .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (08) :2765-2779
[63]   Dynamic Graph Attention for Referring Expression Comprehension [J].
Yang, Sibei ;
Li, Guanbin ;
Yu, Yizhou .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4643-4652
[64]   Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition [J].
Yin, Guojun ;
Sheng, Lu ;
Liu, Bin ;
Yu, Nenghai ;
Wang, Xiaogang ;
Shao, Jing ;
Loy, Chen Change .
COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 :330-347
[65]  
Ying R., 2018, Advances in Neural Information Processing Systems, P4805
[66]   Statistically-Motivated Second-Order Pooling [J].
Yu, Kaicheng ;
Salzmann, Mathieu .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :621-637
[67]   Visual Translation Embedding Network for Visual Relation Detection [J].
Zhang, Hanwang ;
Kyaw, Zawlin ;
Chang, Shih-Fu ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3107-3115
[68]  
Zhang N, 2014, LECT NOTES COMPUT SC, V8689, P834, DOI 10.1007/978-3-319-10590-1_54
[69]   Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition [J].
Zheng, Heliang ;
Fu, Jianlong ;
Mei, Tao ;
Luo, Jiebo .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5219-5227
[70]  
Zhi Hou, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12360), P584, DOI 10.1007/978-3-030-58555-6_35