Deep Neural Network for Robust Speech Recognition With Auxiliary Features From Laser-Doppler Vibrometer Sensor

被引:0
作者
Xie, Zhipeng [1 ]
Du, Jun [1 ]
McLoughlin, Ian [2 ]
Xu, Yong [3 ]
Ma, Feng [3 ]
Wang, Haikun [3 ]
机构
[1] Univ Sci & Technol China, NELSLIP, Hefei, Anhui, Peoples R China
[2] Univ Kent, Sch Comp, Medway, England
[3] IFlytek Res, Hefei, Anhui, Peoples R China
来源
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年
关键词
laser Doppler vibrometer; auxiliary features; deep neural network; regression model; speech recognition;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, the signal captured from a laser Doppler vibrometer (LDV) sensor been used to improve the noise robustness automatic speech recognition (ASR) systems by enhancing the acoustic signal prior to feature extraction. This study proposes another approach in which auxiliary features extracted from the LDV signal are used alongside conventional acoustic features to further improve ASR performance based on the use of a deep neural network (DNN) as the acoustic model. While this approach is promising, the best training data sets for ASR do not include LDV data in parallel with the acoustic signal. Thus, to leverage such existing large-scale speech databases, a regression DNN is designed to map acoustic features to LDV features. This regression DNN is well trained from a limited size parallel signal data set, then used to form pseudo-LDV features from a massive speech data set for parallel training of an ASR system. Our experiments show that both the features from the limited scale LDV data set as well as the massive scale pseudo-LDV features are able to train an ASR system that significantly outperforms one using acoustic features alone, in both quiet and noisy environments.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Deep neural network architectures for dysarthric speech analysis and recognition
    Zaidi, Brahim Fares
    Selouani, Sid Ahmed
    Boudraa, Malika
    Sidi Yakoub, Mohammed
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (15) : 9089 - 9108
  • [32] TOWARDS STRUCTURED DEEP NEURAL NETWORK FOR AUTOMATIC SPEECH RECOGNITION
    Liao, Yi-Hsiu
    Lee, Hung-yi
    Lee, Lin-shan
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 137 - 144
  • [33] Noise-Robust Speech Recognition Based on RBF Neural Network
    Hou, Xuemei
    HIGH PERFORMANCE STRUCTURES AND MATERIALS ENGINEERING, PTS 1 AND 2, 2011, 217-218 : 413 - 418
  • [34] NEW TYPES OF DEEP NEURAL NETWORK LEARNING FOR SPEECH RECOGNITION AND RELATED APPLICATIONS: AN OVERVIEW
    Deng, Li
    Hinton, Geoffrey
    Kingsbury, Brian
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8599 - 8603
  • [35] Speech enhancement from fused features based on deep neural network and gated recurrent unit network
    Wang, Youming
    Han, Jiali
    Zhang, Tianqi
    Qing, Didi
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2021, 2021 (01)
  • [36] Audio-Visual (Multimodal) Speech Recognition System Using Deep Neural Network
    Paulin, Hebsibah
    Milton, R. S.
    JanakiRaman, S.
    Chandraprabha, K.
    JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 3963 - 3974
  • [37] Speech enhancement from fused features based on deep neural network and gated recurrent unit network
    Youming Wang
    Jiali Han
    Tianqi Zhang
    Didi Qing
    EURASIP Journal on Advances in Signal Processing, 2021
  • [38] Incorporating a Generative Front-end Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition
    Kundu, Souvik
    Sim, Khe Chai
    Gales, Mark
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2359 - 2363
  • [39] A study on Gaussian mixture model deep neural network hybrid-based feature compensation for robust speech recognition in noisy environments
    Yoon, Ki-mu
    Kim, Wooil
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2018, 37 (06): : 506 - 511
  • [40] Deep Neural Network Bottleneck Features for Acoustic Event Recognition
    Mun, Seongkyu
    Shon, Suwon
    Kim, Wooil
    Ko, Hanseok
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2954 - 2957