Exploiting foreign resources for DNN-based ASR

被引：9

作者：

Motlicek, Petr ^{[1
]}

Imseng, David ^{[1
]}

Potard, Blaise ^{[1
]}

Garner, Philip N. ^{[1
]}

Himawan, Ivan ^{[1
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

来源：

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2015年

关键词：

Automatic speech recognition; Deep learning for speech; Acoustic model adaptation; Semi-supervised training; SPEECH; ALGORITHM; FEATURES;

D O I：

10.1186/s13636-015-0058-5

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it is therefore essential to explore alternatives capable of improving speech recognition results. In this paper, we investigate the relevance of foreign data characteristics, in particular domain and language, when using this data as an auxiliary data source for training ASR acoustic models based on deep neural networks (DNNs). The acoustic models are evaluated on a challenging bilingual database within the scope of the MediaParl project. Experimental results suggest that in-language (but out-of-domain) data is more beneficial than in-domain (but out-of-language) data when employed in either supervised or semi-supervised training of DNNs. The best performing ASR system, an HMM/GMM acoustic model that exploits DNN as a discriminatively trained feature extractor outperforms the best performing HMM/DNN hybrid by about 5 % relative (in terms of WER). An accumulated relative gain with respect to the MFCC-HMM/GMM baseline is about 30 % WER.

引用

页码：1 / 10

页数：10

共 50 条

[1] Exploiting foreign resources for DNN-based ASR
Petr Motlicek
David Imseng
Blaise Potard
Philip N. Garner
Ivan Himawan
EURASIP Journal on Audio, Speech, and Music Processing, 2015
[2] Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-based ASR System
Arai, Kenichi
Araki, Shoko
Ogawa, Atsunori
Kinoshita, Keisuke
Nakatani, Tomohiro
Yamamoto, Katsuhiko
Irino, Toshio
INTERSPEECH 2019, 2019, : 4275 - 4279
[3] DNN-based interference mitigation beamformer
Ramezanpour, Parham
Mosavi, Mohammad Reza
IET RADAR SONAR AND NAVIGATION, 2020, 14 (11) : 1788 - 1794
[4] Unsupervised Training of a DNN-based Formant Tracker
Lilley, Jason
Bunnell, H. Timothy
INTERSPEECH 2021, 2021, : 1189 - 1193
[5] Performance Analysis of various Front-end and Back End Amalgamations for Noise-robust DNN-based ASR
Dua M.
Sethi P.S.
Agrawal V.
Chawla R.
Recent Advances in Computer Science and Communications, 2021, 14 (09) : 2800 - 2816
[6] DNN-Based Semantic Rescoring Models for Speech Recognition
Illina, Irina
Fohr, Dominique
TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 357 - 370
[7] Prediction of speech intelligibility with DNN-based performance measures
Martinez, Angel Mario Castro
Spille, Constantin
Rossbach, Jana
Kollmeier, Birger
Meyer, Bernd T.
COMPUTER SPEECH AND LANGUAGE, 2022, 74
[8] IMPACT OF SINGLE-MICROPHONE DEREVERBERATION ON DNN-BASED MEETING TRANSCRIPTION SYSTEMS
Yoshioka, Takuya
Chen, Xie
Gales, Mark J. F.
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[9] AUTOREGRESSIVE PARAMETER ESTIMATION WITH DNN-BASED PRE-PROCESSING
Cui, Zihao
Bao, Changchun
Nielsen, Jesper Kjoer
Christensen, Mads Groesboll
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6759 - 6763
[10] DNN-based speech enhancement with self-attention on feature dimension
Cheng, Jiaming
Liang, Ruiyu
Zhao, Li
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (43-44) : 32449 - 32470

← 1 2 3 4 5 →