Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition

被引:0
|
作者
Liu, Licheng [1 ]
Ji, Yan [1 ]
Wang, Hongcui [1 ]
Denby, Bruce [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
关键词
silent speech recognition; feature extraction; autoencoder; non-acoustic feature;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hidden Markov Model and Deep Neural Network-Hidden Markov Model speech recognition performance for a portable ultrasound + video multimodal silent speech interface is investigated using Discrete Cosine Transform and Deep Auto Encoder-based features with a range of dimensionalities. Experimental results show that the two types of features achieve similar Word Error Rate, but that the autoencoder features maintain good performance even for very low-dimension feature vectors, demonstrating potential as a very compact representation of the information in multimodal silent speech data. It is also shown for the first time that the Deep Network/ Markov approach, which has been demonstrated to be beneficial for acoustic speech recognition and for articulatory sensor-based silent speech, improves the silent speech recognition performance for video-based silent speech recognition as well.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] DNN-HMM based Automatic Speech Recognition for HRI Scenarios
    Novoa, Jose
    Wuth, Jorge
    Pablo Escudero, Juan
    Fredes, Josue
    Mahu, Rodrigo
    Becerra Yoma, Nestor
    HRI '18: PROCEEDINGS OF THE 2018 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2018, : 150 - 159
  • [2] Comparison of syllable-based and phoneme-based DNN-HMM in Japanese Speech Recognition
    Seki, Hiroshi
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    2014 INTERNATIONAL CONFERENCE OF ADVANCED INFORMATICS: CONCEPT, THEORY AND APPLICATION (ICAICTA), 2014, : 249 - 254
  • [3] Research on Speech Accurate Recognition Technology Based on Deep Learning DNN-HMM
    Xia Wanyu
    Qiu Wu
    Feng Xiancheng
    MIPPR 2019: PATTERN RECOGNITION AND COMPUTER VISION, 2020, 11430
  • [4] Contaminated speech training methods for robust DNN-HMM distant speech recognition
    Ravanelli, Mirco
    Omologo, Maurizio
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 756 - 760
  • [5] Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework
    Peng, Yizhou
    Zhang, Jicheng
    Zhang, Haobo
    Xu, Haihua
    Huang, Hao
    Li, Sheng
    Chng, Eng Siong
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1043 - 1048
  • [6] Labeling Unsegmented Sequence Data with DNN-HMM and Its Application for Speech Recognition
    Li, Xiangang
    Wu, Xihong
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 10 - 14
  • [7] Phonotactic Language Recognition Based on DNN-HMM Acoustic Model
    Liu, Wei-Wei
    Cai, Meng
    Yuan, Hua
    Shi, Xiao-Bei
    Zhang, Wei-Qiang
    Liu, Jia
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 153 - +
  • [8] Syllable based DNN-HMM Cantonese Speech-to-Text System
    Wong, Timothy
    Li, Claire W. Y.
    Lam, Sam
    Chiu, Billy
    Lu, Qin
    Li, Minglei
    Xiong, Dan
    Yu, Roy S.
    Ng, Vincent T. Y.
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3856 - 3862
  • [9] Large Vocabulary Children's Speech Recognition with DNN-HMM and SGMM Acoustic Modeling
    Giuliani, Diego
    BabaAli, Bagher
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1635 - 1639
  • [10] Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition
    Li, Longfei
    Zhao, Yong
    Jiang, Dongmei
    Zhang, Yanning
    Wang, Fengna
    Gonzalez, Isabel
    Valentin, Enescu
    Sahli, Hichem
    2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, : 312 - 317