UPL-Net: Uncertainty-aware prompt learning network for semi-supervised action recognition

被引：0

作者：

Yang, Shu ^{[1
]}

Li, Ya-Li ^{[1
]}

Wang, Shengjin ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 619卷

关键词：

Semi-supervised learning; Prompt learning; Vision-language pre-training; Action recognition; Uncertainty estimation;

D O I：

10.1016/j.neucom.2024.129126

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper focuses on understanding human behavior in videos by reframing the traditional video classification task as a transfer learning problem centered on visual concepts. Unlike existing action recognition approaches that rely solely on single-modal representations and video classifiers, our method leverages an uncertainty- aware prompt learning network (UPL-Net). This network is designed to extract spatiotemporal features that are pertinent to action-related concepts in videos while ensuring that the visual concepts derived from images are preserved. Furthermore, we introduce an uncertainty-guided semi-supervised learning strategy that harnesses unlabeled videos to enhance the model's generalizability. Extensive experiments conducted on benchmark datasets, namely UCF and HMDB, demonstrate the superiority of our approach over state-of-the-art semi- supervised action recognition methods. Notably, under a 1% labeling rate on the UCF dataset, our method achieves a significant improvement of 12.8%, underscoring its effectiveness in leveraging limited labeled data and abundant unlabeled videos for improved performance.

引用

页数：11

共 61 条

[1] ViViT: A Video Vision Transformer [J].

Arnab, Anurag ;

Dehghani, Mostafa ;

Heigold, Georg ;

Sun, Chen ;

Lucic, Mario ;

Schmid, Cordelia .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826

[2]

Brown TB, 2020, ADV NEUR IN, V33

[3] Emerging Properties in Self-Supervised Vision Transformers [J].

Caron, Mathilde ;

Touvron, Hugo ;

Misra, Ishan ;

Jegou, Herve ;

Mairal, Julien ;

Bojanowski, Piotr ;

Joulin, Armand .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640

[4] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[5] VLP: A Survey on Vision-language Pre-training [J].

Chen, Fei-Long ;

Zhang, Du-Zhen ;

Han, Ming-Lun ;

Chen, Xiu-Yi ;

Shi, Jing ;

Xu, Shuang ;

Xu, Bo .

MACHINE INTELLIGENCE RESEARCH, 2023, 20 (01) :38-56

[6] UNITER: UNiversal Image-TExt Representation Learning [J].

Chen, Yen-Chun ;

Li, Linjie ;

Yu, Licheng ;

El Kholy, Ahmed ;

Ahmed, Faisal ;

Gan, Zhe ;

Cheng, Yu ;

Liu, Jingjing .

COMPUTER VISION - ECCV 2020, PT XXX, 2020, 12375 :104-120

[7]

Chen YZ, 2023, AAAI CONF ARTIF INTE, P396

[8]

Dass SDS, 2024, Arxiv, DOI arXiv:2404.06243

[9] TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition [J].

Dave, Ishan Rajendrakumar ;

Rizve, Mamshad Nayeem ;

Chen, Chen ;

Shah, Mubarak .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :2341-2352

[10]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 4 5 6 7 →