Exploiting foreign resources for DNN-based ASR

被引:9
|
作者
Motlicek, Petr [1 ]
Imseng, David [1 ]
Potard, Blaise [1 ]
Garner, Philip N. [1 ]
Himawan, Ivan [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2015年
关键词
Automatic speech recognition; Deep learning for speech; Acoustic model adaptation; Semi-supervised training; SPEECH; ALGORITHM; FEATURES;
D O I
10.1186/s13636-015-0058-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it is therefore essential to explore alternatives capable of improving speech recognition results. In this paper, we investigate the relevance of foreign data characteristics, in particular domain and language, when using this data as an auxiliary data source for training ASR acoustic models based on deep neural networks (DNNs). The acoustic models are evaluated on a challenging bilingual database within the scope of the MediaParl project. Experimental results suggest that in-language (but out-of-domain) data is more beneficial than in-domain (but out-of-language) data when employed in either supervised or semi-supervised training of DNNs. The best performing ASR system, an HMM/GMM acoustic model that exploits DNN as a discriminatively trained feature extractor outperforms the best performing HMM/DNN hybrid by about 5 % relative (in terms of WER). An accumulated relative gain with respect to the MFCC-HMM/GMM baseline is about 30 % WER.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [41] Exploiting Eigenposteriors for Semi-supervised Training of DNN Acoustic Models with Sequence Discrimination
    Dighe, Pranay
    Asaei, Afsaneh
    Bourlard, Nerve
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3552 - 3556
  • [42] Open ASR for Icelandic: Resources and a Baseline System
    Nikulasdottir, Anna Bjork
    Helgadottir, Inga Run
    Petursson, Matthias
    Gudnason, Jon
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3137 - 3141
  • [43] IMPROVING HMM/DNN IN ASR OF UNDER-RESOURCED LANGUAGES USING PROBABILISTIC SAMPLING
    Song, Meixu
    Zhang, Qingqing
    Pan, Jielin
    Yan, Yonghong
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 20 - 24
  • [44] DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR
    Nathwani, Karan
    Vincent, Emmanuel
    Illina, Irina
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (03) : 338 - 342
  • [45] DNN-Based Speech Bandwidth Expansion and Its Application to Adding High-Frequency Missing Features for Automatic Speech Recognition of Narrowband Speech
    Li, Kehuang
    Huang, Zhen
    Xu, Yong
    Lee, Chin-Hui
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2578 - 2582
  • [46] Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR
    Sahraeian, Reza
    Van Compernolle, Dirk
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) : 1991 - 2001
  • [47] Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR
    Alotaibi, Yousef Ajami
    Muhammad, Ghulam
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02) : 219 - 231
  • [48] Audio and ASR-based Filled Pause Detection
    Chatziagapi, Aggelina
    Sgouropoulos, Dimitris
    Karouzos, Constantinos
    Melistas, Thomas
    Giannakopoulos, Theodoros
    Katsamanis, Athanasios
    Narayanan, Shrikanth
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,
  • [49] Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques
    Gale, Robert
    Chen, Liu
    Dolata, Jill
    van Santen, Jan
    Asgari, Meysam
    INTERSPEECH 2019, 2019, : 11 - 15
  • [50] Pronunciation Modeling of Foreign Words for Mandarin ASR by Considering the Effect of Language Transfer
    Wang, Lei
    Tong, Rong
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1443 - 1447