Deep Neural Network for Robust Speech Recognition With Auxiliary Features From Laser-Doppler Vibrometer Sensor

被引:0
|
作者
Xie, Zhipeng [1 ]
Du, Jun [1 ]
McLoughlin, Ian [2 ]
Xu, Yong [3 ]
Ma, Feng [3 ]
Wang, Haikun [3 ]
机构
[1] Univ Sci & Technol China, NELSLIP, Hefei, Anhui, Peoples R China
[2] Univ Kent, Sch Comp, Medway, England
[3] IFlytek Res, Hefei, Anhui, Peoples R China
来源
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年
关键词
laser Doppler vibrometer; auxiliary features; deep neural network; regression model; speech recognition;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, the signal captured from a laser Doppler vibrometer (LDV) sensor been used to improve the noise robustness automatic speech recognition (ASR) systems by enhancing the acoustic signal prior to feature extraction. This study proposes another approach in which auxiliary features extracted from the LDV signal are used alongside conventional acoustic features to further improve ASR performance based on the use of a deep neural network (DNN) as the acoustic model. While this approach is promising, the best training data sets for ASR do not include LDV data in parallel with the acoustic signal. Thus, to leverage such existing large-scale speech databases, a regression DNN is designed to map acoustic features to LDV features. This regression DNN is well trained from a limited size parallel signal data set, then used to form pseudo-LDV features from a massive speech data set for parallel training of an ASR system. Our experiments show that both the features from the limited scale LDV data set as well as the massive scale pseudo-LDV features are able to train an ASR system that significantly outperforms one using acoustic features alone, in both quiet and noisy environments.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] DEEP NEURAL NETWORK FEATURES AND SEMI-SUPERVISED TRAINING FOR LOW RESOURCE SPEECH RECOGNITION
    Thomas, Samuel
    Seltzer, Michael L.
    Church, Kenneth
    Hermansky, Hynek
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6704 - 6708
  • [22] Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition
    Li, Bo
    Sainath, Tara N.
    Weiss, Ron J.
    Wilson, Kevin W.
    Bacchiani, Michiel
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1976 - 1980
  • [23] Neural Network Based Recognition of Speech Using MFCC Features
    Barua, Pialy
    Ahmad, Kanij
    Khan, Ainul Anam Shahjamal
    Sanaullah, Muhammad
    2014 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2014,
  • [24] An Improved Tibetan Lhasa Speech Recognition Method Based on Deep Neural Network
    Ruan, Wenbin
    Gan, Zhenye
    Liu, Bin
    Guo, Yin
    2017 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION (ICICTA 2017), 2017, : 303 - 306
  • [25] Deep Neural Network Based Speech Recognition Systems Under Noise Perturbations
    An, Qiyuan
    Bai, Kangjun
    Zhang, Moqi
    Yi, Yang
    Liu, Yifang
    PROCEEDINGS OF THE TWENTYFIRST INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2020), 2020, : 377 - 382
  • [26] Predominant Instrument Recognition Based on Deep Neural Network With Auxiliary Classification
    Yu, Dongyan
    Duan, Huiping
    Fang, Jun
    Zeng, Bing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 852 - 861
  • [27] Employing Robust Principal Component Analysis for Noise-Robust Speech Feature Extraction in Automatic Speech Recognition with the Structure of a Deep Neural Network
    Hung, Jeih-weih
    Lin, Jung-Shan
    Wu, Po-Jen
    APPLIED SYSTEM INNOVATION, 2018, 1 (03) : 1 - 14
  • [28] A Study on Speech Emotion Recognition Using a Deep Neural Network
    Lee, Kyong Hee
    Choi, Hyun Kyun
    Jang, Byung Tae
    Kim, Do Hyun
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
  • [29] Deep neural network architectures for dysarthric speech analysis and recognition
    Brahim Fares Zaidi
    Sid Ahmed Selouani
    Malika Boudraa
    Mohammed Sidi Yakoub
    Neural Computing and Applications, 2021, 33 : 9089 - 9108
  • [30] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
    Huang, Ying
    Hu, Mingqing
    Yu, Xianguo
    Wang, Tao
    Yang, Chen
    PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729