Exploiting foreign resources for DNN-based ASR

被引：9

作者：

Motlicek, Petr ^{[1
]}

Imseng, David ^{[1
]}

Potard, Blaise ^{[1
]}

Garner, Philip N. ^{[1
]}

Himawan, Ivan ^{[1
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

来源：

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2015年

关键词：

Automatic speech recognition; Deep learning for speech; Acoustic model adaptation; Semi-supervised training; SPEECH; ALGORITHM; FEATURES;

D O I：

10.1186/s13636-015-0058-5

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it is therefore essential to explore alternatives capable of improving speech recognition results. In this paper, we investigate the relevance of foreign data characteristics, in particular domain and language, when using this data as an auxiliary data source for training ASR acoustic models based on deep neural networks (DNNs). The acoustic models are evaluated on a challenging bilingual database within the scope of the MediaParl project. Experimental results suggest that in-language (but out-of-domain) data is more beneficial than in-domain (but out-of-language) data when employed in either supervised or semi-supervised training of DNNs. The best performing ASR system, an HMM/GMM acoustic model that exploits DNN as a discriminatively trained feature extractor outperforms the best performing HMM/DNN hybrid by about 5 % relative (in terms of WER). An accumulated relative gain with respect to the MFCC-HMM/GMM baseline is about 30 % WER.

引用

页码：1 / 10

页数：10

共 50 条

[41] Exploiting Eigenposteriors for Semi-supervised Training of DNN Acoustic Models with Sequence Discrimination
Dighe, Pranay
Asaei, Afsaneh
Bourlard, Nerve
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3552 - 3556
[42] Open ASR for Icelandic: Resources and a Baseline System
Nikulasdottir, Anna Bjork
Helgadottir, Inga Run
Petursson, Matthias
Gudnason, Jon
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3137 - 3141
[43] IMPROVING HMM/DNN IN ASR OF UNDER-RESOURCED LANGUAGES USING PROBABILISTIC SAMPLING
Song, Meixu
Zhang, Qingqing
Pan, Jielin
Yan, Yonghong
2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 20 - 24
[44] DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR
Nathwani, Karan
Vincent, Emmanuel
Illina, Irina
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (03) : 338 - 342
[45] DNN-Based Speech Bandwidth Expansion and Its Application to Adding High-Frequency Missing Features for Automatic Speech Recognition of Narrowband Speech
Li, Kehuang
Huang, Zhen
Xu, Yong
Lee, Chin-Hui
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2578 - 2582
[46] Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR
Sahraeian, Reza
Van Compernolle, Dirk
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) : 1991 - 2001
[47] Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR
Alotaibi, Yousef Ajami
Muhammad, Ghulam
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02) : 219 - 231
[48] Audio and ASR-based Filled Pause Detection
Chatziagapi, Aggelina
Sgouropoulos, Dimitris
Karouzos, Constantinos
Melistas, Thomas
Giannakopoulos, Theodoros
Katsamanis, Athanasios
Narayanan, Shrikanth
2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,
[49] Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques
Gale, Robert
Chen, Liu
Dolata, Jill
van Santen, Jan
Asgari, Meysam
INTERSPEECH 2019, 2019, : 11 - 15
[50] Pronunciation Modeling of Foreign Words for Mandarin ASR by Considering the Effect of Language Transfer
Wang, Lei
Tong, Rong
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1443 - 1447

← 1 2 3 4 5 →