Exploiting foreign resources for DNN-based ASR

被引:9
|
作者
Motlicek, Petr [1 ]
Imseng, David [1 ]
Potard, Blaise [1 ]
Garner, Philip N. [1 ]
Himawan, Ivan [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2015年
关键词
Automatic speech recognition; Deep learning for speech; Acoustic model adaptation; Semi-supervised training; SPEECH; ALGORITHM; FEATURES;
D O I
10.1186/s13636-015-0058-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it is therefore essential to explore alternatives capable of improving speech recognition results. In this paper, we investigate the relevance of foreign data characteristics, in particular domain and language, when using this data as an auxiliary data source for training ASR acoustic models based on deep neural networks (DNNs). The acoustic models are evaluated on a challenging bilingual database within the scope of the MediaParl project. Experimental results suggest that in-language (but out-of-domain) data is more beneficial than in-domain (but out-of-language) data when employed in either supervised or semi-supervised training of DNNs. The best performing ASR system, an HMM/GMM acoustic model that exploits DNN as a discriminatively trained feature extractor outperforms the best performing HMM/DNN hybrid by about 5 % relative (in terms of WER). An accumulated relative gain with respect to the MFCC-HMM/GMM baseline is about 30 % WER.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [31] DNN-Based Speech Enhancement Using Soft Audible Noise Masking for Wind Noise Reduction
    Bai, Haichuan
    Ge, Fengpei
    Yan, Yonghong
    CHINA COMMUNICATIONS, 2018, 15 (09) : 235 - 243
  • [32] DNN-BASED SCORING OF LANGUAGE LEARNERS 'PROFICIENCY USING LEARNERS' SHADOWINGS AND NATIVE LISTENERS' RESPONSIVE SHADOWINGS
    Kabashima, Suguru
    Inoue, Yuusuke
    Saito, Daisuke
    Minematsu, Nobuaki
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 971 - 978
  • [33] Semi-supervised DNN training with word selection for ASR
    Vesely, Karel
    Burget, Lukas
    Cernocky, Jan Honza
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3687 - 3691
  • [34] Cost-Driven Off-Loading for DNN-Based Applications Over Cloud, Edge, and End Devices
    Lin, Bin
    Huang, Yinhao
    Zhang, Jianshan
    Hu, Junqin
    Chen, Xing
    Li, Jun
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (08) : 5456 - 5466
  • [35] On quantifying the quality of acoustic models in hybrid DNN-HMM ASR
    Dighe, Pranay
    Asaei, Afsaneh
    Bourlard, Herve
    SPEECH COMMUNICATION, 2020, 119 : 24 - 35
  • [36] DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters
    Martinez, Angel Mario Castro
    Gerlach, Lukas
    Paya-Vaya, Guillermo
    Hermansky, Hynek
    Ooster, Jasper
    Meyer, Bernd T.
    SPEECH COMMUNICATION, 2019, 106 : 44 - 56
  • [37] AN EXTENDED EXPERIMENTAL INVESTIGATION OF DNN UNCERTAINTY PROPAGATION FOR NOISE ROBUST ASR
    Nathwani, Karan
    Morales-Cordovilla, Juan A.
    Sivasankaran, Sunit
    Illina, Irina
    Vincent, Emmanuel
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 26 - 30
  • [38] A MAXIMUM LIKELIHOOD APPROACH TO MULTI-OBJECTIVE LEARNING USING GENERALIZED GAUSSIAN DISTRIBUTIONS FOR DNN-BASED SPEECH ENHANCEMENT
    Niu, Shu-Tong
    Du, Jun
    Chai, Li
    Lee, Chin-Hui
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6229 - 6233
  • [39] Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring
    Li, Qiujia
    Zhang, Chao
    Woodland, Philip C.
    SPEECH COMMUNICATION, 2023, 147 : 12 - 21
  • [40] EXPLOITING A 'GAZE-LOMBARD EFFECT' TO IMPROVE ASR PERFORMANCE IN ACOUSTICALLY NOISY SETTINGS
    Cooke, Neil
    Shen, Ao
    Russell, Martin
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,