Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition

被引:0
|
作者
Liu, Licheng [1 ]
Ji, Yan [1 ]
Wang, Hongcui [1 ]
Denby, Bruce [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
关键词
silent speech recognition; feature extraction; autoencoder; non-acoustic feature;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hidden Markov Model and Deep Neural Network-Hidden Markov Model speech recognition performance for a portable ultrasound + video multimodal silent speech interface is investigated using Discrete Cosine Transform and Deep Auto Encoder-based features with a range of dimensionalities. Experimental results show that the two types of features achieve similar Word Error Rate, but that the autoencoder features maintain good performance even for very low-dimension feature vectors, demonstrating potential as a very compact representation of the information in multimodal silent speech data. It is also shown for the first time that the Deep Network/ Markov approach, which has been demonstrated to be beneficial for acoustic speech recognition and for articulatory sensor-based silent speech, improves the silent speech recognition performance for video-based silent speech recognition as well.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system
    Pujol, P
    Pol, S
    Nadeu, C
    Hagen, A
    Bourlard, H
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (01): : 14 - 22
  • [32] BOTTLENECK LINEAR TRANSFORMATION NETWORK ADAPTATION FOR SPEAKER ADAPTIVE TRAINING-BASED HYBRID DNN-HMM SPEECH RECOGNIZER
    Ochiai, Tsubasa
    Matsuda, Shigeki
    Watanabe, Hideyuki
    Lu, Xugang
    Kawai, Hisashi
    Katagiri, Shigeru
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5015 - 5019
  • [33] Deep Autoencoder based Speech Features for Improved Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Das, Biswajit
    Kopparapu, Sunil Kumar
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1854 - 1858
  • [34] Advances in subword-based HMM-DNN speech recognition across languages
    Smit, Peter
    Virpioja, Sami
    Kurimo, Mikko
    COMPUTER SPEECH AND LANGUAGE, 2021, 66 (66):
  • [35] Use of voicing features in HMM-based speech recognition
    Thomson, DL
    Chengalvarayan, R
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 197 - 211
  • [36] Lip-reading via a DNN-HMM Hybrid System Using Combination of The Image-based and Model-based Features
    Rahmani, Mohammad Hasan
    Almasganj, Farshad
    2017 3RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND IMAGE ANALYSIS (IPRIA), 2017, : 195 - 199
  • [37] DNN-HMM-based automatic speech recognition system for intelligent LED lighting control
    Xian, J. L.
    Cai, W. X.
    Pan, H. X.
    Chen, N. Z.
    Chen, X. Y.
    Sun, Y. W.
    Yan, D.
    AUTOMATIC CONTROL, MECHATRONICS AND INDUSTRIAL ENGINEERING, 2019, : 73 - 78
  • [38] Novel Front-End Features Based on Neural Graph Embeddings for DNN-HMM and LSTM-CTC Acoustic Modeling
    Liu, Yuzong
    Kirchhoff, Katrin
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 793 - 797
  • [39] UNSUPERVISED DOMAIN ADAPTATION FOR ROBUST SPEECH RECOGNITION VIA VARIATIONAL AUTOENCODER-BASED DATA AUGMENTATION
    Hsu, Wei-Ning
    Zhang, Yu
    Glass, James
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 16 - 23
  • [40] A Comparison of Speech Synthesis Systems Based on GPR, HMM, and DNN with a Small Amount of Training Data
    Koriyama, Tomoki
    Kobayashi, Takao
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3496 - 3500