End-to-End Semi-Supervised Learning for Video Action Detection

被引：22

作者：

Kumar, Akash ^{[1
]}

Rawat, Yogesh Singh ^{[1
]}

机构：

[1] Univ Cent Florida, Ctr Res Comp Vis, Orlando, FL 32816 USA

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.01429

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we focus on semi-supervised learning for video action detection which utilizes both labeled as well as unlabeled data. We propose a simple end-to-end consistency based approach which effectively utilizes the unlabeled data. Video action detection requires both, action class prediction as well as a spatio-temporal localization of actions. Therefore, we investigate two types of constraints, classification consistency, and spatio-temporal consistency. The presence of predominant background and static regions in a video makes it challenging to utilize spatio-temporal consistency for action detection. To address this, we propose two novel regularization constraints for spatio-temporal consistency; 1) temporal coherency, and 2) gradient smoothness. Both these aspects exploit the temporal continuity of action in videos and are found to be effective for utilizing unlabeled videos for action detection. We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets, UCF101-24 and JHMDB-21. In addition, we also show the effectiveness of the proposed approach for video object segmentation on the Youtube-VOS which demonstrates its generalization capability The proposed approach achieves competitive performance by using merely 20% of annotations on UCF101-24 when compared with recent fully supervised methods. On UCF101-24, it improves the score by +8.9% and +11% at 0.5 f-mAP and v-mAP respectively, compared to supervised approach. The code and models will be made publicly available at: https://github.com/AKASH2907/End-to-End-Semi-Supervised-Learning-for-Video-Action-Detection.

引用

页码：14680 / 14690

页数：11

共 50 条

[1] Tic action recognition for children tic disorder with end-to-end video semi-supervised learning
Wang, Xiangyang
Yang, Kun
Ding, Qiang
Wang, Rui
Sun, Jinhua
VISUAL COMPUTER, 2025,
[2] Semi-Supervised End-to-End Learning for Integrated Sensing and Communications
Mateos-Ramos, Jose Miguel
Chatelier, Baptiste
Hager, Christian
Keskin, Musa Furkan
Le Magoarou, Luc
Wymeersch, Henk
2024 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING FOR COMMUNICATION AND NETWORKING, ICMLCN 2024, 2024, : 132 - 138
[3] GrowingNet: An end-to-end growing network for semi-supervised learning
Zhang, Qifei
Yu, Xiaomo
COMPUTER COMMUNICATIONS, 2020, 151 : 208 - 215
[4] ACTIVEMATCH: END-TO-END SEMI-SUPERVISED ACTIVE REPRESENTATION LEARNING
Yuan, Xinkai
Li, Zilinghan
Wang, Gaoang
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1136 - 1140
[5] End-to-End Semi-supervised Learning for Differentiable Particle Filters
Wen, Hao
Chen, Xiongjie
Papagiannis, Georgios
Hu, Conghui
Li, Yunpeng
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 5825 - 5831
[6] Semi-Supervised Learning with Data Augmentation for End-to-End ASR
Weninger, Felix
Mana, Franco
Gemello, Roberto
Andres-Ferrer, Jesus
Zhan, Puming
INTERSPEECH 2020, 2020, : 2802 - 2806
[7] End-to-End Semi-Supervised Object Detection with Soft Teacher
Xu, Mengde
Zhang, Zheng
Hu, Han
Wang, Jianfeng
Wang, Lijuan
Wei, Fangyun
Bai, Xiang
Liu, Zicheng
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3040 - 3049
[8] End-to-end semi-supervised deep learning model for surface crack detection of infrastructures
Mohammed, Mohammed Ameen
Han, Zheng
Li, Yange
Al-Huda, Zaid
Li, Changli
Wang, Weidong
FRONTIERS IN MATERIALS, 2022, 9
[9] Semi-Supervised End-to-End Speech Recognition
Karita, Shigeki
Watanabe, Shinji
Iwata, Tomoharu
Ogawa, Atsunori
Delcroix, Marc
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2 - 6
[10] Towards Precise End-to-end Semi-Supervised Human Head Detection Network
Li, Rongchun
Zhang, Junjie
Liu, Yuntao
Dou, Yong
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →