Deep Neural Network for Robust Speech Recognition With Auxiliary Features From Laser-Doppler Vibrometer Sensor

被引：0

作者：

Xie, Zhipeng ^{[1
]}

Du, Jun ^{[1
]}

McLoughlin, Ian ^{[2
]}

Xu, Yong ^{[3
]}

Ma, Feng ^{[3
]}

Wang, Haikun ^{[3
]}

机构：

[1] Univ Sci & Technol China, NELSLIP, Hefei, Anhui, Peoples R China

[2] Univ Kent, Sch Comp, Medway, England

[3] IFlytek Res, Hefei, Anhui, Peoples R China

来源：

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年

关键词：

laser Doppler vibrometer; auxiliary features; deep neural network; regression model; speech recognition;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Recently, the signal captured from a laser Doppler vibrometer (LDV) sensor been used to improve the noise robustness automatic speech recognition (ASR) systems by enhancing the acoustic signal prior to feature extraction. This study proposes another approach in which auxiliary features extracted from the LDV signal are used alongside conventional acoustic features to further improve ASR performance based on the use of a deep neural network (DNN) as the acoustic model. While this approach is promising, the best training data sets for ASR do not include LDV data in parallel with the acoustic signal. Thus, to leverage such existing large-scale speech databases, a regression DNN is designed to map acoustic features to LDV features. This regression DNN is well trained from a limited size parallel signal data set, then used to form pseudo-LDV features from a massive speech data set for parallel training of an ASR system. Our experiments show that both the features from the limited scale LDV data set as well as the massive scale pseudo-LDV features are able to train an ASR system that significantly outperforms one using acoustic features alone, in both quiet and noisy environments.

引用

页数：5

共 50 条

[31] Deep neural network architectures for dysarthric speech analysis and recognition
Zaidi, Brahim Fares
Selouani, Sid Ahmed
Boudraa, Malika
Sidi Yakoub, Mohammed
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (15) : 9089 - 9108
[32] TOWARDS STRUCTURED DEEP NEURAL NETWORK FOR AUTOMATIC SPEECH RECOGNITION
Liao, Yi-Hsiu
Lee, Hung-yi
Lee, Lin-shan
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 137 - 144
[33] Noise-Robust Speech Recognition Based on RBF Neural Network
Hou, Xuemei
HIGH PERFORMANCE STRUCTURES AND MATERIALS ENGINEERING, PTS 1 AND 2, 2011, 217-218 : 413 - 418
[34] NEW TYPES OF DEEP NEURAL NETWORK LEARNING FOR SPEECH RECOGNITION AND RELATED APPLICATIONS: AN OVERVIEW
Deng, Li
Hinton, Geoffrey
Kingsbury, Brian
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8599 - 8603
[35] Speech enhancement from fused features based on deep neural network and gated recurrent unit network
Wang, Youming
Han, Jiali
Zhang, Tianqi
Qing, Didi
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2021, 2021 (01)
[36] Audio-Visual (Multimodal) Speech Recognition System Using Deep Neural Network
Paulin, Hebsibah
Milton, R. S.
JanakiRaman, S.
Chandraprabha, K.
JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 3963 - 3974
[37] Speech enhancement from fused features based on deep neural network and gated recurrent unit network
Youming Wang
Jiali Han
Tianqi Zhang
Didi Qing
EURASIP Journal on Advances in Signal Processing, 2021
[38] Incorporating a Generative Front-end Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition
Kundu, Souvik
Sim, Khe Chai
Gales, Mark
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2359 - 2363
[39] A study on Gaussian mixture model deep neural network hybrid-based feature compensation for robust speech recognition in noisy environments
Yoon, Ki-mu
Kim, Wooil
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2018, 37 (06): : 506 - 511
[40] Deep Neural Network Bottleneck Features for Acoustic Event Recognition
Mun, Seongkyu
Shon, Suwon
Kim, Wooil
Ko, Hanseok
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2954 - 2957

← 1 2 3 4 5 →