Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition

被引:7
作者
Sun, Lei [1 ]
Du, Jun [2 ]
Xie, Zhipeng [3 ]
Xu, Yong [4 ]
机构
[1] Univ Sci & Technol China, 96 JinZhai Rd, Hefei, Anhui, Peoples R China
[2] Univ Sci & Technol China, iFlytek Speech Lab, 96 JinZhai Rd, Hefei, Anhui, Peoples R China
[3] iFlytek Co Ltd, iFlytek Res, Hefei, Anhui, Peoples R China
[4] Univ Surrey, Guildford GU2 7XH, Surrey, England
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2018年 / 90卷 / 07期
基金
中国国家自然科学基金;
关键词
Laser Doppler vibrometer; Auxiliary features; Deep neural network; Regression model; Speech recognition; NOISE;
D O I
10.1007/s11265-017-1287-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, the signals captured from a laser Doppler vibrometer (LDV) sensor have shown the noise robustness to automatic speech recognition (ASR) systems by enhancing the acoustic signal prior to feature extraction. In this study, an alternative approach, namely concatenating the auxiliary features extracted from the LDV signal with the conventional acoustic features, is proposed to further improve ASR performance based on the deep neural network (DNN) for acoustic modeling. The preliminary experiments on a small set of stereo-data including both LDV and acoustic signals demonstrate its effectiveness. Thus, to leverage more existing large-scale speech databases, a regression DNN is designed to map acoustic features to LDV features, which is well trained from a stereo-data set with a limited size and then used to generate pseudo-LDV features from a massive speech data set for parallel training of an ASR system. Our experiments verify that both the features from the limited scale LDV data set as well as the massive scale pseudo-LDV features can yield significant improvements of recognition performance over the system using purely acoustic features, in both quiet and noisy environments.
引用
收藏
页码:975 / 983
页数:9
相关论文
共 50 条
  • [31] INVESTIGATING DEEP NEURAL NETWORK BASED TRANSFORMS OF ROBUST AUDIO FEATURES FOR LVCSR
    Bocchieri, Enrico
    Dimitriadis, Dimitrios
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6709 - 6713
  • [32] Employing Robust Principal Component Analysis for Noise-Robust Speech Feature Extraction in Automatic Speech Recognition with the Structure of a Deep Neural Network
    Hung, Jeih-weih
    Lin, Jung-Shan
    Wu, Po-Jen
    APPLIED SYSTEM INNOVATION, 2018, 1 (03) : 1 - 14
  • [33] Robust Object Detection Based on Deep Neural Network and Saliency Features from Visible and Thermal Images
    Mebtouche, Naoual El-Djouher
    Baha, Nadia
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 529 - 540
  • [34] Deep Convolutional Neural Network-based Speech Signal Enhancement Using Extensive Speech Features
    Garg, Anil
    Sahu, O. P.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL METHODS, 2022, 19 (08)
  • [35] Continuous Hindi Speech Recognition Using Kaldi ASR Based on Deep Neural Network
    Upadhyaya, Prashant
    Mittal, Sanjeev Kumar
    Farooq, Omar
    Varshney, Yash Vardhan
    Abidi, Musiur Raza
    MACHINE INTELLIGENCE AND SIGNAL ANALYSIS, 2019, 748 : 303 - 311
  • [36] Incorporating a Generative Front-end Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition
    Kundu, Souvik
    Sim, Khe Chai
    Gales, Mark
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2359 - 2363
  • [37] Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition
    Qian, Yanmin
    Tan, Tian
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) : 2231 - 2240
  • [38] Proposing Two Speaker Adaptaion Methods for Deep Neural Network based Speech Recognition Systems
    Ansari, Zohreh
    Salehi, Seyyed Ali Seyyed
    2014 7TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2014, : 452 - 457
  • [39] LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION
    Himawan, Ivan
    Motlicek, Petr
    Imseng, David
    Potard, Blaise
    Kim, Namhoon
    Lee, Jaewon
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4540 - 4544
  • [40] Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network
    Han, Jaemin
    Kim, Min Sik
    Kim, Hyung Soon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (01): : 64 - 68