A Survey of Human-Object Interaction Detection

被引:0
|
作者
Gong X. [1 ,2 ]
Zhang Z. [2 ]
Liu L. [2 ]
Ma B. [2 ]
Wu K. [1 ]
机构
[1] School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu
[2] Graduate School of Tangshan, Southwest Jiaotong University, Tangshan
关键词
action recognition; human-object interaction (HOI); object detection; visual relationship;
D O I
10.3969/j.issn.0258-2724.20210339
中图分类号
学科分类号
摘要
As an interdisciplinary subject of object detection, action recognition and visual relationship detection, human-object interaction (HOI) detection aims to identify the interaction between humans and objects in specific application scenarios. Here, recent work in the field of image-based HOI detection is systematically summarized. Firstly, based on the theory of interaction modeling, HOI detection methods can be divided into two categories: global instance based and local instance based, and the representative methods are elaborated and analyzed in detail. Further, according to the differences in visual features, the methods based on the global instance are further subdivided into fusion of spatial information, fusion of appearance information and fusion of body posture information. Finally, the applications of zero-shot learning, weakly supervised learning and Transformer model in HOI detection are discussed. From three aspects of HOI, visual distraction and motion perspective, the challenges faced by HOI detection are listed, and it is pointed out that domain generalization, real-time detection and end-to-end network are the future development trends. © 2022 Authors. All rights reserved.
引用
收藏
页码:693 / 704
页数:11
相关论文
共 57 条
  • [1] JOHNSON J, KRISHNA R, STARK M, Et al., Image retrieval using scene graphs, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668-3678, (2015)
  • [2] LI Y K, OUYANG W L, ZHOU B L, Et al., Scene graph generation from objects, phrases and region captions
  • [3] XU D F, ZHU Y K, CHOY C B, Et al., Scene graph generation by iterative message passing
  • [4] BERGSTROM T, SHI H., Human-object interaction detection: a quick survey and examination of methods [DB/OL]
  • [5] GUPTA A, KEMBHAVI A, DAVIS L S., Observing human-object interactions: using spatial and functional compatibility for recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 10, pp. 1775-1789, (2009)
  • [6] ALESSANDRO P, CORDELIA S, VITTORIO F., Weakly supervised learning of interactions between humans and objects, IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 3, pp. 601-614, (2012)
  • [7] LI L J, LI F F., What, where and who? Classifying events by scene and object recognition, Proceedings of IEEE International Conference on Computer Vision, pp. 1-8, (2007)
  • [8] LE D T, UIJLINGS J, BERNARDI R., TUHOI:trento universal human object interaction dataset, Proceedings of the Third Workshop on Vision and Language, pp. 17-24, (2014)
  • [9] CHAO Y W, WANG Z, HE Y, Et al., HICO: a benchmark for recognizing human-object interactions in images, IEEE International Conference on Computer Vision, pp. 1-9, (2015)
  • [10] ANDRILUKA M, PISHCHULIN L, GEHLER P, Et al., 2d human pose estimation: New benchmark and state of the art analysis, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686-3693, (2014)