Language Adaptive DNNs for Improved Low Resource Speech Recognition

被引:7
|
作者
Mueller, Markus [1 ]
Stueker, Sebastian [1 ]
Waibel, Alex [1 ]
机构
[1] Karlsruhe Inst Technol, Inst Anthropomat & Robot, Karlsruhe, Germany
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
Multilingual acoustic modelling; neural networks; low-resource ASR;
D O I
10.21437/Interspeech.2016-1143
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep Neural Network (DNN) acoustic models are commonly used in today's state-of-the-art speech recognition systems. As neural networks are a data driven method, the amount of available training data directly impacts the performance. In the past, several studies have shown that multilingual training of DNNs leads to improvements, especially in resource constrained tasks in which only limited training data in the target language is available. Previous studies have shown speaker adaptation to be successfully performed on DNNs. This is achieved by adding speaker information (e.g. i-Vectors) as additional input features. Based on the idea of adding additional features, we here present a method for adding language information to the input features of the network. Preliminary experiments have shown improvements by providing supervised information about language identity to the network. In this work, we extended this approach by training a neural network to encode language specific features. We extracted those features unsupervised and used them to provide additional cues to the DNN acoustic model during training. Our results show that augmenting acoustic input features with this language code enabled the network to better capture language specific peculiarities. This improved the performance of systems trained using data from multiple languages.
引用
收藏
页码:3878 / 3882
页数:5
相关论文
共 50 条
  • [1] Distance-Aware DNNs for Robust Speech Recognition
    Miao, Yajie
    Metze, Florian
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 761 - 765
  • [2] Frontier Research on Low-Resource Speech Recognition Technology
    Slam, Wushour
    Li, Yanan
    Urouvas, Nurmamet
    SENSORS, 2023, 23 (22)
  • [3] A STUDY OF RANK-CONSTRAINED MULTILINGUAL DNNS FOR LOW-RESOURCE ASR
    Sahraeian, Reza
    Van Compernolle, Dirk
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5420 - 5424
  • [4] How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
    Sterpu, George
    Saam, Christian
    Harte, Naomi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1052 - 1064
  • [5] Low-resource Sinhala Speech Recognition using Deep Learning
    Karunathilaka, Hirunika
    Welgama, Viraj
    Nadungodage, Thilini
    Weerasinghe, Ruvan
    2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
  • [6] ISI ASR System for the Low Resource Speech Recognition Challenge for Indian Languages
    Billa, Jayadev
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3207 - 3211
  • [7] Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition
    Zhou, Rui
    Koshikawa, Takaki
    Ito, Akinori
    Nose, Takashi
    Chen, Chia-Ping
    IEEE ACCESS, 2024, 12 : 158493 - 158504
  • [8] LOW-RESOURCE LANGUAGE IDENTIFICATION FROM SPEECH USING TRANSFER LEARNING
    Feng, Kexin
    Chaspari, Theodora
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [9] GATED CONVOLUTIONAL NETWORKS BASED HYBRID ACOUSTIC MODELS FOR LOW RESOURCE SPEECH RECOGNITION
    Kang, Jian
    Zhang, Wei-Qiang
    Liu, Jia
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 157 - 164
  • [10] Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition
    Yi, Cheng
    Zhou, Shiyu
    Xu, Bo
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (28) : 788 - 792