Harnessing GANs for Zero-Shot Learning of New Classes in Visual Speech Recognition

被引:0
|
作者
Kumar, Yaman [1 ]
Sahrawat, Dhruva [2 ]
Maheshwari, Shubham [3 ]
Mahata, Debanjan [4 ]
Stent, Amanda [4 ]
Yin, Yifang [2 ]
Shah, Rajiv Ratn [3 ]
Zimmermann, Roger [2 ]
机构
[1] Adobe, Mountain View, CA 85027 USA
[2] NUS, Singapore, Singapore
[3] IIIT Delhi, MIDAS Lab, Delhi, India
[4] Bloomberg, New York, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the lip movements of the speaker. Recent machine learning based approaches model VSR as a classification problem; however, the scarcity of training data leads to error-prone systems with very low accuracies in predicting unseen classes. To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases. We also show that our models are language agnostic and therefore capable of seamlessly generating, using English training data, videos for a new language (Hindi). To the best of our knowledge, this is the first work to show empirical evidence of the use of GANs for generating training samples of unseen classes in the domain of VSR, hence facilitating zero-shot learning. We make the added videos for new classes publicly available along with our code(1).
引用
收藏
页码:2645 / 2652
页数:8
相关论文
共 50 条
  • [1] Zero-shot recognition with latent visual attributes learning
    Xie, Yurui
    He, Xiaohai
    Zhang, Jing
    Luo, Xiaodong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (37-38) : 27321 - 27335
  • [2] Hyperbolic Visual Embedding Learning for Zero-Shot Recognition
    Liu, Shaoteng
    Chen, Jingjing
    Pan, Liangming
    Ngo, Chong-Wah
    Chua, Tat-Seng
    Jiang, Yu-Gang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9270 - 9278
  • [3] Zero-shot recognition with latent visual attributes learning
    Yurui Xie
    Xiaohai He
    Jing Zhang
    Xiaodong Luo
    Multimedia Tools and Applications, 2020, 79 : 27321 - 27335
  • [4] Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning
    Changpinyo, Soravit
    Chao, Wei-Lun
    Sha, Fei
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3496 - 3505
  • [5] Generalized Zero-Shot Learning for Action Recognition Fusing Text and Image GANs
    Huang, Kaiqiang
    McKeever, Susan
    Miralles-Pechuan, Luis
    IEEE ACCESS, 2024, 12 : 5188 - 5202
  • [6] Zero-Shot Learning by Harnessing Adversarial Samples
    Chen, Zhi
    Zhang, Pengfei
    Li, Jingjing
    Wang, Sen
    Huang, Zi
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4138 - 4146
  • [7] Zero-Shot Keyword Spotting for Visual Speech Recognition In-the-wild
    Stafylakis, Themos
    Tzimiropoulos, Georgios
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 536 - 552
  • [8] Learning discriminative visual semantic embedding for zero-shot recognition
    Xie, Yurui
    Song, Tiecheng
    Yuan, Jianying
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 115
  • [9] Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition
    Xu, Xinzhou
    Deng, Jun
    Cummins, Nicholas
    Zhang, Zixing
    Zhao, Li
    Schuller, Bjorn W.
    INTERSPEECH 2019, 2019, : 949 - 953
  • [10] Zero-Shot Federated Learning with New Classes for Audio Classification
    Gudur, Gautham Krishna
    Perepu, Satheesh Kumar
    INTERSPEECH 2021, 2021, : 1579 - 1583