A Survey of Human-Object Interaction Detection

被引:0
|
作者
Gong X. [1 ,2 ]
Zhang Z. [2 ]
Liu L. [2 ]
Ma B. [2 ]
Wu K. [1 ]
机构
[1] School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu
[2] Graduate School of Tangshan, Southwest Jiaotong University, Tangshan
关键词
action recognition; human-object interaction (HOI); object detection; visual relationship;
D O I
10.3969/j.issn.0258-2724.20210339
中图分类号
学科分类号
摘要
As an interdisciplinary subject of object detection, action recognition and visual relationship detection, human-object interaction (HOI) detection aims to identify the interaction between humans and objects in specific application scenarios. Here, recent work in the field of image-based HOI detection is systematically summarized. Firstly, based on the theory of interaction modeling, HOI detection methods can be divided into two categories: global instance based and local instance based, and the representative methods are elaborated and analyzed in detail. Further, according to the differences in visual features, the methods based on the global instance are further subdivided into fusion of spatial information, fusion of appearance information and fusion of body posture information. Finally, the applications of zero-shot learning, weakly supervised learning and Transformer model in HOI detection are discussed. From three aspects of HOI, visual distraction and motion perspective, the challenges faced by HOI detection are listed, and it is pointed out that domain generalization, real-time detection and end-to-end network are the future development trends. © 2022 Authors. All rights reserved.
引用
收藏
页码:693 / 704
页数:11
相关论文
共 57 条
  • [11] GUPTA S, MALIK J., Visual semantic role labeling
  • [12] CHAO Y W, LIU Y, LIU X, Et al., Learning to detect human-object interactions, 2018 IEEE Winter Conference on Applications of Computer Vision, pp. 381-389, (2018)
  • [13] LI Y L, XU L, LIU X, Et al., Pastanet: Toward human activity knowledge engine, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 379-388, (2020)
  • [14] LIAO Y, LIU S, WANG F, Et al., PPDM: Parallel point detection and matching for real-time human-object interaction detection, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 479-487, (2020)
  • [15] ZHUANG B, WU Q, SHEN C, Et al., Hcvrd: a benchmark for large-scale human-centered visual relationship detection, Proceedings of the AAAI Conference on Artificial Intelligence, (2018)
  • [16] XU B J, LI J N, YONGKANG W, Et al., Interact as You intend: intention-driven human-object interaction detection, IEEE Transactions on Multimedia, 22, 6, pp. 1423-1432, (2019)
  • [17] ULUTAN O, IFTEKHAR A S M, MANJUNATH B S., Vsgnet: spatial attention network for detecting human object interactions using graph convolutions, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13617-13626, (2020)
  • [18] GIRSHICK R., Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448, (2015)
  • [19] GAO C, ZOU Y, HUANG J B., iCAN:instance-centric attention network for human-object interaction detection
  • [20] WANG T, ANWER R M, KHAN M H, Et al., Deep contextual attention for human-object interaction detection, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5694-5702, (2019)