Distilling interaction knowledge for semi-supervised egocentric action recognition

被引:0
作者
Wang, Haoran [1 ]
Yang, Jiahao [1 ]
Yu, Baosheng [2 ]
Zhan, Yibing [3 ]
Tao, Dapeng [4 ]
Ling, Haibin [5 ]
机构
[1] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Peoples R China
[2] Univ Sydney, Sch Comp Sci, Darlington, NSW 2008, Australia
[3] JD Explore Acad, Beijing 100176, Peoples R China
[4] Yunnan Univ, Sch Informat Sci & Engn, Kunming 650091, Yunnan, Peoples R China
[5] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY USA
关键词
Knowledge distillation; Semi-supervised learning; Egocentric action recognition;
D O I
10.1016/j.patcog.2024.110927
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric action recognition, the identification of actions within video content obtained from a first-person perspective, is receiving increasing attention due to the widespread adoption of wearable camera technology. Nonetheless, the task of annotating actions within a video characterized by a cluttered background and the presence of various objects is labor-intensive. In this paper, we consider learning for egocentric action recognition in a semi-supervised manner. Inspired by the fact that videos captured from first-person viewpoint usually contain rich contents about how human hands interact with objects, we thus propose to employ a popular teacher-student framework and distill the interaction knowledge between hand and objects for semi-supervised egocentric action recognition. We refer to the proposed method as Interaction Knowledge Distillation or IKD. Specifically, the teacher network takes hands and action-related objects in the labeled videos as input, and uses graph neural networks to capture their spatial-temporal relations as graph edge features. The student network then takes the detected hands/objects from both labeled and unlabeled videos as input and mimics the teacher network to learn from the interactions to improve model performance. Experiments are performed on two popular egocentric action recognition datasets, Something-Something-V2 and EPIC- KITCHENS-100, which show that our proposed approach consistently outperforms recent state-of-the-art methods in typical semi-supervised settings.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Semi-Supervised Action Recognition From Temporal Augmentation Using Curriculum Learning
    Tong, Anyang
    Tang, Chao
    Wang, Wenjian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1305 - 1319
  • [22] Neighbor-Guided Consistent and Contrastive Learning for Semi-Supervised Action Recognition
    Wu, Jianlong
    Sun, Wei
    Gan, Tian
    Ding, Ning
    Jiang, Feijun
    Shen, Jialie
    Nie, Liqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2215 - 2227
  • [23] A novel semi-supervised learning for face recognition
    Gao, Quanxue
    Huang, Yunfang
    Gao, Xinbo
    Shen, Weiguo
    Zhang, Hailin
    NEUROCOMPUTING, 2015, 152 : 69 - 76
  • [24] SEMI-SUPERVISED LEARNING FOR MUSICAL INSTRUMENT RECOGNITION
    Diment, Aleksandr
    Heittola, Toni
    Virtanen, Tuomas
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
  • [25] Semi-supervised learning for tongue constitution recognition
    Ma, Yichao
    Wu, Chunhong
    Li, Tian
    FOURTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING, ICGIP 2022, 2022, 12705
  • [26] Semi-supervised Generic Descriptor in Face Recognition
    Han, Pang Ying
    Ling, Goh Fan
    Yin, Ooi Shih
    2015 IEEE 11TH INTERNATIONAL COLLOQUIUM ON SIGNAL PROCESSING & ITS APPLICATIONS (CSPA 2015), 2015, : 21 - 25
  • [27] Semi-supervised Model for Emotion Recognition in Speech
    Pereira, Ingryd
    Santos, Diego
    Maciel, Alexandre
    Barros, Pablo
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 791 - 800
  • [28] Ensemble Knowledge Distillation for Federated Semi-Supervised Image Classification
    Shang, Ertong
    Liu, Hui
    Zhang, Jingyang
    Zhao, Runqi
    Du, Junzhao
    TSINGHUA SCIENCE AND TECHNOLOGY, 2025, 30 (01): : 112 - 123
  • [29] ASSET: A Semi-supervised Approach for Entity Typing in Knowledge Graphs
    Zahera, Hamada M.
    Heindorf, Stefan
    Ngomo, Axel-Cyrille Ngonga
    PROCEEDINGS OF THE 11TH KNOWLEDGE CAPTURE CONFERENCE (K-CAP '21), 2021, : 261 - 264
  • [30] Compressed video ensemble based pseudo-labeling for semi-supervised action recognition
    Terao, Hayato
    Noguchi, Wataru
    Iizuka, Hiroyuki
    Yamamoto, Masahito
    MACHINE LEARNING WITH APPLICATIONS, 2022, 9