Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition

被引：0

作者：

Hatano, Masashi ^{[1
]}

Hachiuma, Ryo ^{[2
]}

Fujii, Ryo ^{[1
]}

Saito, Hideo ^{[1
]}

机构：

[1] Keio Univ, Minato, Japan

[2] NVIDIA, Santa Clara, CA USA

来源：

COMPUTER VISION - ECCV 2024, PT XXXIII | 2025年 / 15091卷

关键词：

Egocentric Vision; Action Recognition; Cross-Domain; Few-Shot Learning; Multimodal Distillation;

D O I：

10.1007/978-3-031-73414-4_11

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We address a novel cross-domain few-shot learning task (CDFSL) with multimodal input and unlabeled target data for egocentric action recognition. This paper simultaneously tackles two critical challenges associated with egocentric action recognition in CD-FSL settings: (1) the extreme domain gap in egocentric videos (e.g., daily life vs. industrial domain) and (2) the computational cost for real-world applications. We propose MM-CDFSL, a domain-adaptive and computationally efficient approach designed to enhance adaptability to the target domain and improve inference cost. To address the first challenge, we propose the incorporation of multimodal distillation into the student RGB model using teacher models. Each teacher model is trained independently on source and target data for its respective modality. Leveraging only unlabeled target data during multimodal distillation enhances the student model's adaptability to the target domain. We further introduce ensemble masked inference, a technique that reduces the number of input tokens through masking. In this approach, ensemble prediction mitigates the performance degradation caused by masking, effectively addressing the second issue. Our approach outperformed the state-of-the-art CD-FSL approaches with a substantial margin on multiple egocentric datasets, improving by an average of 6.12/6.10 points for 1-shot/5-shot settings while achieving 2.2 times faster inference speed. Project page: https://masashi- hatano.github.io/MM- CDFSL/

引用

页码：182 / 199

页数：18

共 50 条

[1] Cross-domain few-shot action recognition with unlabeled videos
Wang, Xiang
Zhang, Shiwei
Qing, Zhiwu
Lv, Yiliang
Gao, Changxin
Sang, Nong
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
[2] Understanding Cross-Domain Few-Shot Learning Based on Domain Similarity and Few-Shot Difficulty
Oh, Jaehoon
Kim, Sungnyun
Ho, Namgyu
Kim, Jin-Hwa
Song, Hwanjun
Yun, Se-Young
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] Knowledge transduction for cross-domain few-shot learning
Li, Pengfang
Liu, Fang
Jiao, Licheng
Li, Shuo
Li, Lingling
Liu, Xu
Huang, Xinyan
PATTERN RECOGNITION, 2023, 141
[4] A Few-shot learning approach for Monkeypox recognition from a cross-domain perspective
Chen, Bolin
Han, Yu
Yan, Lin
JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 144
[5] Cross-domain few-shot defect recognition for metal surfaces
Duan, Guifang
Song, Yiguo
Liu, Zhenyu
Ling, Shiquan
Tan, Jianrong
MEASUREMENT SCIENCE AND TECHNOLOGY, 2023, 34 (01)
[6] Cross-Domain Few-Shot Micro-Expression Recognition Incorporating Action Units
Dai, Yi
Feng, Ling
IEEE ACCESS, 2021, 9 : 142071 - 142083
[7] Feature extractor stacking for cross-domain few-shot learning
Wang, Hongyu
Frank, Eibe
Pfahringer, Bernhard
Mayo, Michael
Holmes, Geoffrey
MACHINE LEARNING, 2024, 113 (01) : 121 - 158
[8] CDANER: Contrastive Learning with Cross-domain Attention for Few-shot Named Entity Recognition
Li, Wei
Li, Hui
Ge, Jingguo
Zhang, Lei
Li, Liangxiong
Wu, Bingzhen
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[9] Spectral Decomposition and Transformation for Cross-domain Few-shot Learning
Liu, Yicong
Zou, Yixiong
Li, Ruixuan
Li, Yuhua
NEURAL NETWORKS, 2024, 179
[10] Ranking Distance Calibration for Cross-Domain Few-Shot Learning
Li, Pan
Gong, Shaogang
Wang, Chengjie
Fu, Yanwei
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9089 - 9098

← 1 2 3 4 5 →