Exploiting foreign resources for DNN-based ASR

被引：9

作者：

Motlicek, Petr ^{[1
]}

Imseng, David ^{[1
]}

Potard, Blaise ^{[1
]}

Garner, Philip N. ^{[1
]}

Himawan, Ivan ^{[1
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

来源：

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2015年

关键词：

Automatic speech recognition; Deep learning for speech; Acoustic model adaptation; Semi-supervised training; SPEECH; ALGORITHM; FEATURES;

D O I：

10.1186/s13636-015-0058-5

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it is therefore essential to explore alternatives capable of improving speech recognition results. In this paper, we investigate the relevance of foreign data characteristics, in particular domain and language, when using this data as an auxiliary data source for training ASR acoustic models based on deep neural networks (DNNs). The acoustic models are evaluated on a challenging bilingual database within the scope of the MediaParl project. Experimental results suggest that in-language (but out-of-domain) data is more beneficial than in-domain (but out-of-language) data when employed in either supervised or semi-supervised training of DNNs. The best performing ASR system, an HMM/GMM acoustic model that exploits DNN as a discriminatively trained feature extractor outperforms the best performing HMM/DNN hybrid by about 5 % relative (in terms of WER). An accumulated relative gain with respect to the MFCC-HMM/GMM baseline is about 30 % WER.

引用

页码：1 / 10

页数：10

共 50 条

[31] DNN-Based Speech Enhancement Using Soft Audible Noise Masking for Wind Noise Reduction
Bai, Haichuan
Ge, Fengpei
Yan, Yonghong
CHINA COMMUNICATIONS, 2018, 15 (09) : 235 - 243
[32] DNN-BASED SCORING OF LANGUAGE LEARNERS 'PROFICIENCY USING LEARNERS' SHADOWINGS AND NATIVE LISTENERS' RESPONSIVE SHADOWINGS
Kabashima, Suguru
Inoue, Yuusuke
Saito, Daisuke
Minematsu, Nobuaki
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 971 - 978
[33] Semi-supervised DNN training with word selection for ASR
Vesely, Karel
Burget, Lukas
Cernocky, Jan Honza
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3687 - 3691
[34] Cost-Driven Off-Loading for DNN-Based Applications Over Cloud, Edge, and End Devices
Lin, Bin
Huang, Yinhao
Zhang, Jianshan
Hu, Junqin
Chen, Xing
Li, Jun
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (08) : 5456 - 5466
[35] On quantifying the quality of acoustic models in hybrid DNN-HMM ASR
Dighe, Pranay
Asaei, Afsaneh
Bourlard, Herve
SPEECH COMMUNICATION, 2020, 119 : 24 - 35
[36] DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters
Martinez, Angel Mario Castro
Gerlach, Lukas
Paya-Vaya, Guillermo
Hermansky, Hynek
Ooster, Jasper
Meyer, Bernd T.
SPEECH COMMUNICATION, 2019, 106 : 44 - 56
[37] AN EXTENDED EXPERIMENTAL INVESTIGATION OF DNN UNCERTAINTY PROPAGATION FOR NOISE ROBUST ASR
Nathwani, Karan
Morales-Cordovilla, Juan A.
Sivasankaran, Sunit
Illina, Irina
Vincent, Emmanuel
2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 26 - 30
[38] A MAXIMUM LIKELIHOOD APPROACH TO MULTI-OBJECTIVE LEARNING USING GENERALIZED GAUSSIAN DISTRIBUTIONS FOR DNN-BASED SPEECH ENHANCEMENT
Niu, Shu-Tong
Du, Jun
Chai, Li
Lee, Chin-Hui
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6229 - 6233
[39] Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring
Li, Qiujia
Zhang, Chao
Woodland, Philip C.
SPEECH COMMUNICATION, 2023, 147 : 12 - 21
[40] EXPLOITING A 'GAZE-LOMBARD EFFECT' TO IMPROVE ASR PERFORMANCE IN ACOUSTICALLY NOISY SETTINGS
Cooke, Neil
Shen, Ao
Russell, Martin
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,

← 1 2 3 4 5 →