Distillation of human-object interaction contexts for action recognition

被引:1
作者
Almushyti, Muna [1 ,2 ]
Li, Frederick W. B. [1 ]
机构
[1] Univ Durham, Dept Comp Sci, South Rd, Durham DH1 3LE, England
[2] Qassim Univ, Deanship Educ Serv, Buraydah, Saudi Arabia
基金
英国工程与自然科学研究理事会;
关键词
global context; graph attention network local context; human-object interaction;
D O I
10.1002/cav.2107
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Modeling spatial-temporal relations is imperative for recognizing human actions, especially when a human is interacting with objects, while multiple objects appear around the human differently over time. Most existing action recognition models focus on learning overall visual cues of a scene but disregard a holistic view of human-object relationships and interactions, that is, how a human interacts with respect to short-term task for completion and long-term goal. We therefore argue to improve human action recognition by exploiting both the local and global contexts of human-object interactions (HOIs). In this paper, we propose the Global-Local Interaction Distillation Network (GLIDN), learning human and object interactions through space and time via knowledge distillation for holistic HOI understanding. GLIDN encodes humans and objects into graph nodes and learns local and global relations via graph attention network. The local context graphs learn the relation between humans and objects at a frame level by capturing their co-occurrence at a specific time step. The global relation graph is constructed based on the video-level of human and object interactions, identifying their long-term relations throughout a video sequence. We also investigate how knowledge from these graphs can be distilled to their counterparts for improving HOI recognition. Finally, we evaluate our model by conducting comprehensive experiments on two datasets including Charades and CAD-120. Our method outperforms the baselines and counterpart approaches.
引用
收藏
页数:16
相关论文
共 50 条
[41]   Knowledge guided relation enhancement for human-object interaction detection [J].
Su, Rui ;
Gao, Yongbin ;
Yu, Wenjun ;
Wu, Chenmou ;
Jiang, Xiaoyan ;
Zhou, Shubo .
APPLIED INTELLIGENCE, 2025, 55 (06)
[42]   Contextual Heterogeneous Graph Network for Human-Object Interaction Detection [J].
Hai Wang ;
Zheng, Wei-shi ;
Ling Yingbiao .
COMPUTER VISION - ECCV 2020, PT XVII, 2020, 12362 :248-264
[43]   Grounding human-object interaction to affordance behavior in multimodal datasets [J].
Henlein, Alexander ;
Gopinath, Anju ;
Krishnaswamy, Nikhil ;
Mehler, Alexander ;
Pustejovsky, James .
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
[44]   Human-object interaction detection with depth-augmented clues [J].
Cheng, Yamin ;
Duan, Hancong ;
Wang, Chen ;
Wang, Zhi .
NEUROCOMPUTING, 2022, 500 :978-988
[45]   Learning Human-Object Interaction Detection via Deformable Transformer [J].
Cai, Shuang ;
Ma, Shiwei ;
Gu, Dongzhou .
2021 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2021, 12076
[46]   Learning Human-Object Interaction via Interactive Semantic Reasoning [J].
Yang, Dongming ;
Zou, Yuexian ;
Li, Zhu ;
Li, Ge .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :9294-9305
[47]   Pairwise Negative Sample Mining for Human-Object Interaction Detection [J].
Jia, Weizhe ;
Ma, Shiwei .
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 :425-437
[48]   FUSION OF INDEPENDENT AND INTERACTIVE FEATURES FOR HUMAN-OBJECT INTERACTION DETECTION [J].
Wu, Zehai ;
Sheng, Lijie ;
Zhang, Songnian ;
Miao, Qiguang .
2024 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2024, :409-415
[49]   Recognition and Prediction of Human-Object Interactions with a Self-Organizing Architecture [J].
Mici, Luiza ;
Parisi, German, I ;
Wermter, Stefan .
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[50]   Detection of Anomalous Behavior of Manufacturing Workers Using Deep Learning-Based Recognition of Human-Object Interaction [J].
Rijayanti, Rita ;
Hwang, Mintae ;
Jin, Kyohong .
APPLIED SCIENCES-BASEL, 2023, 13 (15)