Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition

被引:7
|
作者
Sun, Lei [1 ]
Du, Jun [2 ]
Xie, Zhipeng [3 ]
Xu, Yong [4 ]
机构
[1] Univ Sci & Technol China, 96 JinZhai Rd, Hefei, Anhui, Peoples R China
[2] Univ Sci & Technol China, iFlytek Speech Lab, 96 JinZhai Rd, Hefei, Anhui, Peoples R China
[3] iFlytek Co Ltd, iFlytek Res, Hefei, Anhui, Peoples R China
[4] Univ Surrey, Guildford GU2 7XH, Surrey, England
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2018年 / 90卷 / 07期
基金
中国国家自然科学基金;
关键词
Laser Doppler vibrometer; Auxiliary features; Deep neural network; Regression model; Speech recognition; NOISE;
D O I
10.1007/s11265-017-1287-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, the signals captured from a laser Doppler vibrometer (LDV) sensor have shown the noise robustness to automatic speech recognition (ASR) systems by enhancing the acoustic signal prior to feature extraction. In this study, an alternative approach, namely concatenating the auxiliary features extracted from the LDV signal with the conventional acoustic features, is proposed to further improve ASR performance based on the deep neural network (DNN) for acoustic modeling. The preliminary experiments on a small set of stereo-data including both LDV and acoustic signals demonstrate its effectiveness. Thus, to leverage more existing large-scale speech databases, a regression DNN is designed to map acoustic features to LDV features, which is well trained from a stereo-data set with a limited size and then used to generate pseudo-LDV features from a massive speech data set for parallel training of an ASR system. Our experiments verify that both the features from the limited scale LDV data set as well as the massive scale pseudo-LDV features can yield significant improvements of recognition performance over the system using purely acoustic features, in both quiet and noisy environments.
引用
收藏
页码:975 / 983
页数:9
相关论文
共 50 条
  • [21] A study on Gaussian mixture model deep neural network hybrid-based feature compensation for robust speech recognition in noisy environments
    Yoon, Ki-mu
    Kim, Wooil
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2018, 37 (06): : 506 - 511
  • [22] Speech recognition system based on visual features and neural network for persons with speech-impairments
    Han, Zhi-yan
    Wang, Xu
    Wang, Jian
    INTERNATIONAL JOURNAL OF MODELLING IDENTIFICATION AND CONTROL, 2009, 8 (03) : 240 - 247
  • [23] VARIABLE-ACTIVATION AND VARIABLE-INPUT DEEP NEURAL NETWORK FOR ROBUST SPEECH RECOGNITION
    Zhao, Rui
    Li, Jinyu
    Gong, Yifan
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 542 - 547
  • [24] Research on Speech Emotion Recognition Technology based on Deep and Shallow Neural Network
    Wang, Jian
    Han, Zhiyan
    PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 3555 - 3558
  • [25] DEEP NEURAL NETWORK FEATURES AND SEMI-SUPERVISED TRAINING FOR LOW RESOURCE SPEECH RECOGNITION
    Thomas, Samuel
    Seltzer, Michael L.
    Church, Kenneth
    Hermansky, Hynek
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6704 - 6708
  • [26] Neural Network based Regression for Robust Overlapping Speech Recognition using Microphone Arrays
    Li, Weifeng
    Dines, John
    Magimai-Doss, Mathew
    Bourlard, Herve
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2012 - 2015
  • [27] Network Oral English Teaching System Based on Speech Recognition Technology and Deep Neural Network
    He N.
    Liu W.
    International Journal of Advanced Computer Science and Applications, 2023, 14 (12): : 829 - 839
  • [28] Network Oral English Teaching System Based on Speech Recognition Technology and Deep Neural Network
    He, Na
    Liu, Weihua
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (12) : 829 - 839
  • [29] Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation
    Seki, Hiroshi
    Yamamoto, Kazumasa
    Akiba, Tomoyosi
    Nakagawa, Seiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (02) : 364 - 374
  • [30] An unsupervised adaptation method for deep neural network-based large vocabulary continuous speech recognition
    Xiao, Yeming
    Si, Yujing
    Xu, Ji
    Pan, Jielin
    Yan, Yonghong
    Journal of Information and Computational Science, 2014, 11 (14): : 4889 - 4899