End-to-End Semi-Supervised Learning for Video Action Detection

被引：22

作者：

Kumar, Akash ^{[1
]}

Rawat, Yogesh Singh ^{[1
]}

机构：

[1] Univ Cent Florida, Ctr Res Comp Vis, Orlando, FL 32816 USA

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.01429

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we focus on semi-supervised learning for video action detection which utilizes both labeled as well as unlabeled data. We propose a simple end-to-end consistency based approach which effectively utilizes the unlabeled data. Video action detection requires both, action class prediction as well as a spatio-temporal localization of actions. Therefore, we investigate two types of constraints, classification consistency, and spatio-temporal consistency. The presence of predominant background and static regions in a video makes it challenging to utilize spatio-temporal consistency for action detection. To address this, we propose two novel regularization constraints for spatio-temporal consistency; 1) temporal coherency, and 2) gradient smoothness. Both these aspects exploit the temporal continuity of action in videos and are found to be effective for utilizing unlabeled videos for action detection. We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets, UCF101-24 and JHMDB-21. In addition, we also show the effectiveness of the proposed approach for video object segmentation on the Youtube-VOS which demonstrates its generalization capability The proposed approach achieves competitive performance by using merely 20% of annotations on UCF101-24 when compared with recent fully supervised methods. On UCF101-24, it improves the score by +8.9% and +11% at 0.5 f-mAP and v-mAP respectively, compared to supervised approach. The code and models will be made publicly available at: https://github.com/AKASH2907/End-to-End-Semi-Supervised-Learning-for-Video-Action-Detection.

引用

页码：14680 / 14690

页数：11

共 50 条

[21] End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Tanaka, Tomohiro
Masumura, Ryo
Ihori, Mana
Takashima, Akihiko
Orihashi, Shota
Makishima, Naoki
INTERSPEECH 2021, 2021, : 4458 - 4462
[22] End-To-End Graph-Based Deep Semi-Supervised Learning with Extended Graph Laplacian
Wang, Zihao
Tu, Enmei
Zhou, Meng
Yang, Jie
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5948 - 5953
[23] Dialect-aware Semi-supervised Learning for End-to-End Multi-dialect Speech Recognition
Shiota, Sayaka
Imaizumi, Ryo
Masumura, Ryo
Kiya, Hitoshi
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 240 - 244
[24] Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution
Zhang, Zi-qiang
Song, Yan
Zhang, Jian-shu
McLoughlin, Ian
Dai, Li-Rong
INTERSPEECH 2020, 2020, : 3580 - 3584
[25] End-to-End Semi-Supervised Opportunistic Osteoporosis Screening Using Computed Tomography
Oh, Jieun
Kim, Boah
Oh, Gyutaek
Hwangbo, Yul
Ye, Jong Chul
ENDOCRINOLOGY AND METABOLISM, 2024, 39 (03) : 500 - 510
[26] Semi-supervised Trajectory Understanding with POI Attention for End-to-End Trip Recommendation
Zhou, Fan
Wu, Hantao
Trajcevski, Goce
Khokhar, Ashfaq
Zhang, Kunpeng
ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2020, 6 (02)
[27] SEMI-SUPERVISED SPEAKER ADAPTATION FOR END-TO-END SPEECH SYNTHESIS WITH PRETRAINED MODELS
Inoue, Katsuki
Hara, Sunao
Abe, Masanobu
Hayashi, Tomoki
Yamamoto, Ryuichi
Watanabe, Shinji
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7634 - 7638
[28] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION VIA LOCAL PRIOR MATCHING
Hsu, Wei-Ning
Lee, Ann
Synnaeve, Gabriel
Hannun, Awni
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 125 - 132
[29] Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization
Takashima, Yuki
Fujita, Yusuke
Horiguchi, Shota
Watanabe, Shinji
Garcia, Paola
Nagamatsu, Kenji
INTERSPEECH 2021, 2021, : 3096 - 3100
[30] SEMI-SUPERVISED TRAINING FOR IMPROVING DATA EFFICIENCY IN END-TO-END SPEECH SYNTHESIS
Chung, Yu-An
Wang, Yuxuan
Hsu, Wei-Ning
Zhang, Yu
Skerry-Ryan, R. J.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6940 - 6944

← 1 2 3 4 5 →