Exploiting foreign resources for DNN-based ASR

被引:9
|
作者
Motlicek, Petr [1 ]
Imseng, David [1 ]
Potard, Blaise [1 ]
Garner, Philip N. [1 ]
Himawan, Ivan [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2015年
关键词
Automatic speech recognition; Deep learning for speech; Acoustic model adaptation; Semi-supervised training; SPEECH; ALGORITHM; FEATURES;
D O I
10.1186/s13636-015-0058-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it is therefore essential to explore alternatives capable of improving speech recognition results. In this paper, we investigate the relevance of foreign data characteristics, in particular domain and language, when using this data as an auxiliary data source for training ASR acoustic models based on deep neural networks (DNNs). The acoustic models are evaluated on a challenging bilingual database within the scope of the MediaParl project. Experimental results suggest that in-language (but out-of-domain) data is more beneficial than in-domain (but out-of-language) data when employed in either supervised or semi-supervised training of DNNs. The best performing ASR system, an HMM/GMM acoustic model that exploits DNN as a discriminatively trained feature extractor outperforms the best performing HMM/DNN hybrid by about 5 % relative (in terms of WER). An accumulated relative gain with respect to the MFCC-HMM/GMM baseline is about 30 % WER.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [21] INTEGRATING DNN-BASED AND SPATIAL CLUSTERING-BASED MASK ESTIMATION FOR ROBUST MVDR BEAMFORMING
    Nakatani, Tomohiro
    To, Nobutaka
    Higuchi, Takuya
    Araki, Shoko
    Kinoshita, Keisuke
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 286 - 290
  • [22] Enhancement of DNN-based multilabel classification by grouping labels based on data imbalance and label correlation
    Chen, Ling
    Wang, Yuhong
    Li, Hao
    PATTERN RECOGNITION, 2022, 132
  • [23] Semi-Supervised Training of DNN-Based Acoustic Model for ATC Speech Recognition
    Smidl, Lubos
    Svec, Jan
    Prazak, Ales
    Trmal, Jan
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 646 - 655
  • [24] Prosodic Information-Assisted DNN-based Mandarin Spontaneous-Speech Recognition
    Deng, Yu-Chih
    Lin, Cheng-Hsin
    Liao, Yuan-Fu
    Wang, Yih-Ru
    Chen, Sin-Horng
    PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020), 2020, : 134 - 138
  • [25] ONLINE INTEGRATION OF DNN-BASED AND SPATIAL CLUSTERING-BASED MASK ESTIMATION FOR ROBUST MVDR BEAMFORMING
    Matsui, Yutaro
    Nakatani, Tomohiro
    Delcroix, Marc
    Kinoshita, Keisuke
    Ito, Nobutaka
    Araki, Shoko
    Makino, Shoji
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 71 - 75
  • [26] DNN-Based Surrogate Modeling-Based Feasible Performance Reliability Design Methodology for Aircraft Engine
    Cao, Dalu
    Bai, Guang-Chen
    IEEE ACCESS, 2020, 8 : 229201 - 229218
  • [27] DNN adaptation by automatic quality estimation of ASR hypotheses
    Falavigna, Daniele
    Matassoni, Marco
    Jalalvand, Shahab
    Negri, Matteo
    Turchi, Marco
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 585 - 604
  • [28] CONSISTENT DNN UNCERTAINTY TRAINING AND DECODING FOR ROBUST ASR
    Nathwani, Karan
    Vincent, Emmanuel
    Illina, Irina
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 185 - 192
  • [29] Stochastic DNN-HMM Training for Robust ASR
    Lee, Kang Hyun
    Kang, Woo Hyun
    Lee, Hyeonseung
    Kim, Nam Soo
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 177 - 182
  • [30] VOCAL TRACT LENGTH NORMALISATION APPROACHES TO DNN-BASED CHILDREN'S AND ADULTS' SPEECH RECOGNITION
    Serizel, Romain
    Giuliani, Diego
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 135 - 140