Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition

被引:7
作者
Sun, Lei [1 ]
Du, Jun [2 ]
Xie, Zhipeng [3 ]
Xu, Yong [4 ]
机构
[1] Univ Sci & Technol China, 96 JinZhai Rd, Hefei, Anhui, Peoples R China
[2] Univ Sci & Technol China, iFlytek Speech Lab, 96 JinZhai Rd, Hefei, Anhui, Peoples R China
[3] iFlytek Co Ltd, iFlytek Res, Hefei, Anhui, Peoples R China
[4] Univ Surrey, Guildford GU2 7XH, Surrey, England
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2018年 / 90卷 / 07期
基金
中国国家自然科学基金;
关键词
Laser Doppler vibrometer; Auxiliary features; Deep neural network; Regression model; Speech recognition; NOISE;
D O I
10.1007/s11265-017-1287-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, the signals captured from a laser Doppler vibrometer (LDV) sensor have shown the noise robustness to automatic speech recognition (ASR) systems by enhancing the acoustic signal prior to feature extraction. In this study, an alternative approach, namely concatenating the auxiliary features extracted from the LDV signal with the conventional acoustic features, is proposed to further improve ASR performance based on the deep neural network (DNN) for acoustic modeling. The preliminary experiments on a small set of stereo-data including both LDV and acoustic signals demonstrate its effectiveness. Thus, to leverage more existing large-scale speech databases, a regression DNN is designed to map acoustic features to LDV features, which is well trained from a stereo-data set with a limited size and then used to generate pseudo-LDV features from a massive speech data set for parallel training of an ASR system. Our experiments verify that both the features from the limited scale LDV data set as well as the massive scale pseudo-LDV features can yield significant improvements of recognition performance over the system using purely acoustic features, in both quiet and noisy environments.
引用
收藏
页码:975 / 983
页数:9
相关论文
共 50 条
  • [41] Convolution Neural Network Based Deep Features for Text Recognition in Multi-Type Images
    Raghunandan, K. S.
    Kumara, Chethana B. M.
    Kumar, G. Hemantha
    Sunil, C.
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 502 - 507
  • [42] Ensemble of jointly trained deep neural network-based acoustic models for reverberant speech recognition
    Lee, Moa
    Lee, Jeehye
    Chang, Joon-Hyuk
    DIGITAL SIGNAL PROCESSING, 2019, 85 : 1 - 9
  • [43] DEEP NEURAL NETWORK BASED WAKE-UP-WORD SPEECH RECOGNITION WITH TWO-STAGE DETECTION
    Ge, Fengpei
    Yan, Yonghong
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2761 - 2765
  • [44] Optimized Multi-Channel Deep Neural Network with 2D Graphical Representation of Acoustic Speech Features for Emotion Recognition
    Stolar, Melissa N.
    Lech, Margaret
    Burnett, Ian S.
    2014 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2014,
  • [45] Deep neural network-based emotion recognition using facial landmark features and particle swarm optimization
    Vaijayanthi, S.
    Arunnehru, J.
    AUTOMATIKA, 2024, 65 (03) : 1088 - 1099
  • [46] The Vietnamese Speech Recognition Based on Rectified Linear Units Deep Neural Network and Spoken Term Detection System Combination
    Xiong, Shifu
    Guo, Wu
    Liu, Diyuan
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 183 - 186
  • [47] A Initial Attempt on Task-Specific Adaptation for Deep Neural Network-based Large Vocabulary Continuous Speech Recognition
    Xiao, Yeming
    Zhang, Zhen
    Cai, Shang
    Pan, Jielin
    Yan, Yonghong
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2573 - 2576
  • [48] Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition
    Li, Guanjun
    Liang, Shan
    Nie, Shuai
    Liu, Wenju
    Yang, Zhanlei
    NEURAL NETWORKS, 2021, 141 : 225 - 237
  • [49] A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition
    Lopez-Espejo, Ivan
    Gonzalez, Jose A.
    Gomez, Angel M.
    Peinado, Antonio M.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 119 - 128
  • [50] A deep neural network approach for missing-data mask estimation on dual-microphone smartphones: Application to noise-robust speech recognition
    López-Espejo, I.
    González, José A.
    Gómez, Ángel M.
    Peinado, Antonio M.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8854 : 119 - 128