Deep Neural Network for Robust Speech Recognition With Auxiliary Features From Laser-Doppler Vibrometer Sensor

被引:0
作者
Xie, Zhipeng [1 ]
Du, Jun [1 ]
McLoughlin, Ian [2 ]
Xu, Yong [3 ]
Ma, Feng [3 ]
Wang, Haikun [3 ]
机构
[1] Univ Sci & Technol China, NELSLIP, Hefei, Anhui, Peoples R China
[2] Univ Kent, Sch Comp, Medway, England
[3] IFlytek Res, Hefei, Anhui, Peoples R China
来源
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年
关键词
laser Doppler vibrometer; auxiliary features; deep neural network; regression model; speech recognition;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, the signal captured from a laser Doppler vibrometer (LDV) sensor been used to improve the noise robustness automatic speech recognition (ASR) systems by enhancing the acoustic signal prior to feature extraction. This study proposes another approach in which auxiliary features extracted from the LDV signal are used alongside conventional acoustic features to further improve ASR performance based on the use of a deep neural network (DNN) as the acoustic model. While this approach is promising, the best training data sets for ASR do not include LDV data in parallel with the acoustic signal. Thus, to leverage such existing large-scale speech databases, a regression DNN is designed to map acoustic features to LDV features. This regression DNN is well trained from a limited size parallel signal data set, then used to form pseudo-LDV features from a massive speech data set for parallel training of an ASR system. Our experiments show that both the features from the limited scale LDV data set as well as the massive scale pseudo-LDV features are able to train an ASR system that significantly outperforms one using acoustic features alone, in both quiet and noisy environments.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Robust Noisy Speech Recognition Using Deep Neural Support Vector Machines
    Amami, Rimah
    Ben Ayed, Dorra
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, 800 : 300 - 307
  • [42] LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION
    Himawan, Ivan
    Motlicek, Petr
    Imseng, David
    Potard, Blaise
    Kim, Namhoon
    Lee, Jaewon
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4540 - 4544
  • [43] A Gender-Aware Deep Neural Network Structure for Speech Recognition
    Toktam Zoughi
    Mohammad Mehdi Homayounpour
    Iranian Journal of Science and Technology, Transactions of Electrical Engineering, 2019, 43 : 635 - 644
  • [44] A Gender-Aware Deep Neural Network Structure for Speech Recognition
    Zoughi, Toktam
    Homayounpour, Mohammad Mehdi
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2019, 43 (03) : 635 - 644
  • [45] A Noise-Robust Speech Recognition System Based on Wavelet Neural Network
    Wang, Yiping
    Zhao, Zhefeng
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT III, 2011, 7004 : 392 - 397
  • [46] Speech recognition system based on visual features and neural network for persons with speech-impairments
    Han, Zhi-yan
    Wang, Xu
    Wang, Jian
    INTERNATIONAL JOURNAL OF MODELLING IDENTIFICATION AND CONTROL, 2009, 8 (03) : 240 - 247
  • [47] NEURON SPARSENESS VERSUS CONNECTION SPARSENESS IN DEEP NEURAL NETWORK FOR LARGE VOCABULARY SPEECH RECOGNITION
    Kang, Jian
    Lu, Cheng
    Cai, Meng
    Zhang, Wei-Qiang
    Liu, Jia
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4954 - 4958
  • [48] Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
    Dadvar, Paria
    Geravanchizadeh, Masoud
    SPEECH COMMUNICATION, 2019, 108 : 41 - 52
  • [49] Adversarial Multi-task Learning of Deep Neural Networks for Robust Speech Recognition
    Shinohara, Yusuke
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2369 - 2372
  • [50] Research on Speech Emotion Recognition Technology based on Deep and Shallow Neural Network
    Wang, Jian
    Han, Zhiyan
    PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 3555 - 3558