EMPC: Efficient multi-view parallel co-learning for semi-supervised action recognition

被引：1

作者：

Tong, Anyang ^{[1
,2
]}

Tang, Chao ^{[1
,2
]}

Wang, Wenjian ^{[3
]}

机构：

[1] Hefei Univ, Sch Artificial Intelligence & Big Data, Hefei 230601, Anhui, Peoples R China

[2] Anhui Univ, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Anhui, Peoples R China

[3] Shanxi Univ, Sch Comp & Informat Sci, Taiyuan 030006, Shanxi, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 255卷

关键词：

Action recognition; Semi-supervised learning; Temporal gradient; Co-learning; Dropout;

D O I：

10.1016/j.eswa.2024.124634

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semi-supervised learning (SSL) is an effective approach to address the challenge of limited labeled data in action recognition. Existing methods have explored temporal augmentation and consistent learning, which have received widespread attention. However, these methods come with an exponential increase in computational effort, and neglect the potential for divergence and collaboration between modalities. Additionally, the models may exhibit randomness during pseudo-label evaluation and inconsistency between training and inference. To address these challenges, we propose an efficient multi-view parallel co-learning (EMPC) method for semisupervised action recognition. First, we explore the temporal gradient (TG) and create a new view that contains rich motion history information, called the historical temporal gradient (HTG). Second, inspired by the working mechanism of Dropout, we assemble a low-computational multi-functional committee (MFC) and perform pseudo-label editing based on two evaluation criteria: confidence and consistency. We further design a new regularization strategy based on MFC, called mean regularized dropout (MR-Drop), which measures and reduces the output distribution's uncertainty between sub-models to improve the model's performance. Finally, based on the complementary information between RGB and HTG views, we build an efficient parallel network with multi-view feature sharing and pseudo-label collaboration. We evaluate EMPC on three public datasets: UCF-101, HMDB-51, and Kinetics-100. The experimental results demonstrate that EMPC achieves better classification performance with a limited amount of labeled data and a large amount of unlabeled data.

引用

页数：13

共 50 条

[1] Ahsan U., 2018, arXiv, DOI 10.48550/arXiv.1801.07230
[2] Berthelot D, 2019, ADV NEUR IN, V32
[3] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[4] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[5] Cascante-Bonilla P, 2021, AAAI CONF ARTIF INTE, V35, P6912
[6] Chen DD, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2014
[7] Chen H.-Y., 2023, 11 INT C LEARN REPR
[8] Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[9] X3D: Expanding Architectures for Efficient Video Recognition
Feichtenhofer, Christoph
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 200 - 210
[10] Convolutional Two-Stream Network Fusion for Video Action Recognition
Feichtenhofer, Christoph
Pinz, Axel
Zisserman, Andrew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941

← 1 2 3 4 5 →