End-to-End Semi-Supervised Learning for Video Action Detection

被引:22
|
作者
Kumar, Akash [1 ]
Rawat, Yogesh Singh [1 ]
机构
[1] Univ Cent Florida, Ctr Res Comp Vis, Orlando, FL 32816 USA
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
关键词
D O I
10.1109/CVPR52688.2022.01429
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we focus on semi-supervised learning for video action detection which utilizes both labeled as well as unlabeled data. We propose a simple end-to-end consistency based approach which effectively utilizes the unlabeled data. Video action detection requires both, action class prediction as well as a spatio-temporal localization of actions. Therefore, we investigate two types of constraints, classification consistency, and spatio-temporal consistency. The presence of predominant background and static regions in a video makes it challenging to utilize spatio-temporal consistency for action detection. To address this, we propose two novel regularization constraints for spatio-temporal consistency; 1) temporal coherency, and 2) gradient smoothness. Both these aspects exploit the temporal continuity of action in videos and are found to be effective for utilizing unlabeled videos for action detection. We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets, UCF101-24 and JHMDB-21. In addition, we also show the effectiveness of the proposed approach for video object segmentation on the Youtube-VOS which demonstrates its generalization capability The proposed approach achieves competitive performance by using merely 20% of annotations on UCF101-24 when compared with recent fully supervised methods. On UCF101-24, it improves the score by +8.9% and +11% at 0.5 f-mAP and v-mAP respectively, compared to supervised approach. The code and models will be made publicly available at: https://github.com/AKASH2907/End-to-End-Semi-Supervised-Learning-for-Video-Action-Detection.
引用
收藏
页码:14680 / 14690
页数:11
相关论文
共 50 条
  • [21] End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
    Tanaka, Tomohiro
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Orihashi, Shota
    Makishima, Naoki
    INTERSPEECH 2021, 2021, : 4458 - 4462
  • [22] End-To-End Graph-Based Deep Semi-Supervised Learning with Extended Graph Laplacian
    Wang, Zihao
    Tu, Enmei
    Zhou, Meng
    Yang, Jie
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5948 - 5953
  • [23] Dialect-aware Semi-supervised Learning for End-to-End Multi-dialect Speech Recognition
    Shiota, Sayaka
    Imaizumi, Ryo
    Masumura, Ryo
    Kiya, Hitoshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 240 - 244
  • [24] Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution
    Zhang, Zi-qiang
    Song, Yan
    Zhang, Jian-shu
    McLoughlin, Ian
    Dai, Li-Rong
    INTERSPEECH 2020, 2020, : 3580 - 3584
  • [25] End-to-End Semi-Supervised Opportunistic Osteoporosis Screening Using Computed Tomography
    Oh, Jieun
    Kim, Boah
    Oh, Gyutaek
    Hwangbo, Yul
    Ye, Jong Chul
    ENDOCRINOLOGY AND METABOLISM, 2024, 39 (03) : 500 - 510
  • [26] Semi-supervised Trajectory Understanding with POI Attention for End-to-End Trip Recommendation
    Zhou, Fan
    Wu, Hantao
    Trajcevski, Goce
    Khokhar, Ashfaq
    Zhang, Kunpeng
    ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2020, 6 (02)
  • [27] SEMI-SUPERVISED SPEAKER ADAPTATION FOR END-TO-END SPEECH SYNTHESIS WITH PRETRAINED MODELS
    Inoue, Katsuki
    Hara, Sunao
    Abe, Masanobu
    Hayashi, Tomoki
    Yamamoto, Ryuichi
    Watanabe, Shinji
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7634 - 7638
  • [28] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION VIA LOCAL PRIOR MATCHING
    Hsu, Wei-Ning
    Lee, Ann
    Synnaeve, Gabriel
    Hannun, Awni
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 125 - 132
  • [29] Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization
    Takashima, Yuki
    Fujita, Yusuke
    Horiguchi, Shota
    Watanabe, Shinji
    Garcia, Paola
    Nagamatsu, Kenji
    INTERSPEECH 2021, 2021, : 3096 - 3100
  • [30] SEMI-SUPERVISED TRAINING FOR IMPROVING DATA EFFICIENCY IN END-TO-END SPEECH SYNTHESIS
    Chung, Yu-An
    Wang, Yuxuan
    Hsu, Wei-Ning
    Zhang, Yu
    Skerry-Ryan, R. J.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6940 - 6944