Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition

被引:0
|
作者
Liu, Licheng [1 ]
Ji, Yan [1 ]
Wang, Hongcui [1 ]
Denby, Bruce [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
关键词
silent speech recognition; feature extraction; autoencoder; non-acoustic feature;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hidden Markov Model and Deep Neural Network-Hidden Markov Model speech recognition performance for a portable ultrasound + video multimodal silent speech interface is investigated using Discrete Cosine Transform and Deep Auto Encoder-based features with a range of dimensionalities. Experimental results show that the two types of features achieve similar Word Error Rate, but that the autoencoder features maintain good performance even for very low-dimension feature vectors, demonstrating potential as a very compact representation of the information in multimodal silent speech data. It is also shown for the first time that the Deep Network/ Markov approach, which has been demonstrated to be beneficial for acoustic speech recognition and for articulatory sensor-based silent speech, improves the silent speech recognition performance for video-based silent speech recognition as well.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Universum Autoencoder-Based Domain Adaptation for Speech Emotion Recognition
    Deng, Jun
    Xu, Xinzhou
    Zhang, Zixing
    Fruhholz, Sascha
    Schuller, Bjorn
    IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (04) : 500 - 504
  • [22] An autoencoder-based feature level fusion for speech emotion recognition
    Peng, Shixin
    Kai, Chen
    Tian, Tian
    Chen, Jingying
    DIGITAL COMMUNICATIONS AND NETWORKS, 2024, 10 (05) : 1341 - 1351
  • [23] Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition
    Deng, Jun
    Zhang, Zixing
    Eyben, Florian
    Schuller, Bjoern
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1068 - 1072
  • [24] An autoencoder-based feature level fusion for speech emotion recognition
    Peng Shixin
    Chen Kai
    Tian Tian
    Chen Jingying
    Digital Communications and Networks, 2024, 10 (05) : 1341 - 1351
  • [25] Autoencoder-Based Articulatory-to-Acoustic Mapping for Ultrasound Silent Speech Interfaces
    Gosztolya, Gabor
    Pinter, Adam
    Toth, Laszlo
    Grosz, Tamas
    Marko, Alexandra
    Csapo, Tamas Gabor
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [26] AN INVESTIGATION ON DNN-DERIVED BOTTLENECK FEATURES FOR GMM-HMM BASED ROBUST SPEECH RECOGNITION
    You, Yongbin
    Qian, Yanmin
    He, Tianxing
    Yu, Kai
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 30 - 34
  • [27] Uncertainty weighting and propagation in DNN-HMM-based speech recognition
    Novoa, Jose
    Fredes, Josue
    Poblete, Victor
    Becerra Yoma, Nestor
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 30 - 46
  • [28] Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction
    Abe, Akihiro
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2849 - 2853
  • [29] Sparse Autoencoder-based Feature Transfer Learning for Speech Emotion Recognition
    Deng, Jun
    Zhang, Zixing
    Marchi, Erik
    Schuller, Bjoern
    2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, : 511 - 516
  • [30] Peripheral features for HMM-based speech recognition
    Fukuda, T
    Takigawa, M
    Nitta, T
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 129 - 132