Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition

被引:0
|
作者
Hatano, Masashi [1 ]
Hachiuma, Ryo [2 ]
Fujii, Ryo [1 ]
Saito, Hideo [1 ]
机构
[1] Keio Univ, Minato, Japan
[2] NVIDIA, Santa Clara, CA USA
来源
COMPUTER VISION - ECCV 2024, PT XXXIII | 2025年 / 15091卷
关键词
Egocentric Vision; Action Recognition; Cross-Domain; Few-Shot Learning; Multimodal Distillation;
D O I
10.1007/978-3-031-73414-4_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address a novel cross-domain few-shot learning task (CDFSL) with multimodal input and unlabeled target data for egocentric action recognition. This paper simultaneously tackles two critical challenges associated with egocentric action recognition in CD-FSL settings: (1) the extreme domain gap in egocentric videos (e.g., daily life vs. industrial domain) and (2) the computational cost for real-world applications. We propose MM-CDFSL, a domain-adaptive and computationally efficient approach designed to enhance adaptability to the target domain and improve inference cost. To address the first challenge, we propose the incorporation of multimodal distillation into the student RGB model using teacher models. Each teacher model is trained independently on source and target data for its respective modality. Leveraging only unlabeled target data during multimodal distillation enhances the student model's adaptability to the target domain. We further introduce ensemble masked inference, a technique that reduces the number of input tokens through masking. In this approach, ensemble prediction mitigates the performance degradation caused by masking, effectively addressing the second issue. Our approach outperformed the state-of-the-art CD-FSL approaches with a substantial margin on multiple egocentric datasets, improving by an average of 6.12/6.10 points for 1-shot/5-shot settings while achieving 2.2 times faster inference speed. Project page: https://masashi- hatano.github.io/MM- CDFSL/
引用
收藏
页码:182 / 199
页数:18
相关论文
共 50 条
  • [1] Cross-domain few-shot action recognition with unlabeled videos
    Wang, Xiang
    Zhang, Shiwei
    Qing, Zhiwu
    Lv, Yiliang
    Gao, Changxin
    Sang, Nong
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
  • [2] Understanding Cross-Domain Few-Shot Learning Based on Domain Similarity and Few-Shot Difficulty
    Oh, Jaehoon
    Kim, Sungnyun
    Ho, Namgyu
    Kim, Jin-Hwa
    Song, Hwanjun
    Yun, Se-Young
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Knowledge transduction for cross-domain few-shot learning
    Li, Pengfang
    Liu, Fang
    Jiao, Licheng
    Li, Shuo
    Li, Lingling
    Liu, Xu
    Huang, Xinyan
    PATTERN RECOGNITION, 2023, 141
  • [4] A Few-shot learning approach for Monkeypox recognition from a cross-domain perspective
    Chen, Bolin
    Han, Yu
    Yan, Lin
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 144
  • [5] Cross-domain few-shot defect recognition for metal surfaces
    Duan, Guifang
    Song, Yiguo
    Liu, Zhenyu
    Ling, Shiquan
    Tan, Jianrong
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2023, 34 (01)
  • [6] Cross-Domain Few-Shot Micro-Expression Recognition Incorporating Action Units
    Dai, Yi
    Feng, Ling
    IEEE ACCESS, 2021, 9 : 142071 - 142083
  • [7] Feature extractor stacking for cross-domain few-shot learning
    Wang, Hongyu
    Frank, Eibe
    Pfahringer, Bernhard
    Mayo, Michael
    Holmes, Geoffrey
    MACHINE LEARNING, 2024, 113 (01) : 121 - 158
  • [8] CDANER: Contrastive Learning with Cross-domain Attention for Few-shot Named Entity Recognition
    Li, Wei
    Li, Hui
    Ge, Jingguo
    Zhang, Lei
    Li, Liangxiong
    Wu, Bingzhen
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [9] Spectral Decomposition and Transformation for Cross-domain Few-shot Learning
    Liu, Yicong
    Zou, Yixiong
    Li, Ruixuan
    Li, Yuhua
    NEURAL NETWORKS, 2024, 179
  • [10] Ranking Distance Calibration for Cross-Domain Few-Shot Learning
    Li, Pan
    Gong, Shaogang
    Wang, Chengjie
    Fu, Yanwei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9089 - 9098