Deep Neural Network for Robust Speech Recognition With Auxiliary Features From Laser-Doppler Vibrometer Sensor

被引：0

作者：

Xie, Zhipeng ^{[1
]}

Du, Jun ^{[1
]}

McLoughlin, Ian ^{[2
]}

Xu, Yong ^{[3
]}

Ma, Feng ^{[3
]}

Wang, Haikun ^{[3
]}

机构：

[1] Univ Sci & Technol China, NELSLIP, Hefei, Anhui, Peoples R China

[2] Univ Kent, Sch Comp, Medway, England

[3] IFlytek Res, Hefei, Anhui, Peoples R China

来源：

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年

关键词：

laser Doppler vibrometer; auxiliary features; deep neural network; regression model; speech recognition;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Recently, the signal captured from a laser Doppler vibrometer (LDV) sensor been used to improve the noise robustness automatic speech recognition (ASR) systems by enhancing the acoustic signal prior to feature extraction. This study proposes another approach in which auxiliary features extracted from the LDV signal are used alongside conventional acoustic features to further improve ASR performance based on the use of a deep neural network (DNN) as the acoustic model. While this approach is promising, the best training data sets for ASR do not include LDV data in parallel with the acoustic signal. Thus, to leverage such existing large-scale speech databases, a regression DNN is designed to map acoustic features to LDV features. This regression DNN is well trained from a limited size parallel signal data set, then used to form pseudo-LDV features from a massive speech data set for parallel training of an ASR system. Our experiments show that both the features from the limited scale LDV data set as well as the massive scale pseudo-LDV features are able to train an ASR system that significantly outperforms one using acoustic features alone, in both quiet and noisy environments.

引用

页数：5

共 50 条

[21] DEEP NEURAL NETWORK FEATURES AND SEMI-SUPERVISED TRAINING FOR LOW RESOURCE SPEECH RECOGNITION
Thomas, Samuel
Seltzer, Michael L.
Church, Kenneth
Hermansky, Hynek
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6704 - 6708
[22] Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition
Li, Bo
Sainath, Tara N.
Weiss, Ron J.
Wilson, Kevin W.
Bacchiani, Michiel
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1976 - 1980
[23] Neural Network Based Recognition of Speech Using MFCC Features
Barua, Pialy
Ahmad, Kanij
Khan, Ainul Anam Shahjamal
Sanaullah, Muhammad
2014 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2014,
[24] An Improved Tibetan Lhasa Speech Recognition Method Based on Deep Neural Network
Ruan, Wenbin
Gan, Zhenye
Liu, Bin
Guo, Yin
2017 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION (ICICTA 2017), 2017, : 303 - 306
[25] Deep Neural Network Based Speech Recognition Systems Under Noise Perturbations
An, Qiyuan
Bai, Kangjun
Zhang, Moqi
Yi, Yang
Liu, Yifang
PROCEEDINGS OF THE TWENTYFIRST INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2020), 2020, : 377 - 382
[26] Predominant Instrument Recognition Based on Deep Neural Network With Auxiliary Classification
Yu, Dongyan
Duan, Huiping
Fang, Jun
Zeng, Bing
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 852 - 861
[27] Employing Robust Principal Component Analysis for Noise-Robust Speech Feature Extraction in Automatic Speech Recognition with the Structure of a Deep Neural Network
Hung, Jeih-weih
Lin, Jung-Shan
Wu, Po-Jen
APPLIED SYSTEM INNOVATION, 2018, 1 (03) : 1 - 14
[28] A Study on Speech Emotion Recognition Using a Deep Neural Network
Lee, Kyong Hee
Choi, Hyun Kyun
Jang, Byung Tae
Kim, Do Hyun
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
[29] Deep neural network architectures for dysarthric speech analysis and recognition
Brahim Fares Zaidi
Sid Ahmed Selouani
Malika Boudraa
Mohammed Sidi Yakoub
Neural Computing and Applications, 2021, 33 : 9089 - 9108
[30] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
Huang, Ying
Hu, Mingqing
Yu, Xianguo
Wang, Tao
Yang, Chen
PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729

← 1 2 3 4 5 →