Language Adaptive DNNs for Improved Low Resource Speech Recognition

被引:7
作者
Mueller, Markus [1 ]
Stueker, Sebastian [1 ]
Waibel, Alex [1 ]
机构
[1] Karlsruhe Inst Technol, Inst Anthropomat & Robot, Karlsruhe, Germany
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
Multilingual acoustic modelling; neural networks; low-resource ASR;
D O I
10.21437/Interspeech.2016-1143
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep Neural Network (DNN) acoustic models are commonly used in today's state-of-the-art speech recognition systems. As neural networks are a data driven method, the amount of available training data directly impacts the performance. In the past, several studies have shown that multilingual training of DNNs leads to improvements, especially in resource constrained tasks in which only limited training data in the target language is available. Previous studies have shown speaker adaptation to be successfully performed on DNNs. This is achieved by adding speaker information (e.g. i-Vectors) as additional input features. Based on the idea of adding additional features, we here present a method for adding language information to the input features of the network. Preliminary experiments have shown improvements by providing supervised information about language identity to the network. In this work, we extended this approach by training a neural network to encode language specific features. We extracted those features unsupervised and used them to provide additional cues to the DNN acoustic model during training. Our results show that augmenting acoustic input features with this language code enabled the network to better capture language specific peculiarities. This improved the performance of systems trained using data from multiple languages.
引用
收藏
页码:3878 / 3882
页数:5
相关论文
共 50 条
[41]   A study of neural network Russian language models for automatic continuous speech recognition systems [J].
Kipyatkova, I. S. ;
Karpov, A. A. .
AUTOMATION AND REMOTE CONTROL, 2017, 78 (05) :858-867
[42]   A study of neural network Russian language models for automatic continuous speech recognition systems [J].
I. S. Kipyatkova ;
A. A. Karpov .
Automation and Remote Control, 2017, 78 :858-867
[43]   On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model [J].
Soutner, Daniel ;
Zelinka, Jan ;
Mueller, Ludek .
SPEECH AND COMPUTER, 2014, 8773 :315-321
[44]   ANALYZING ASR PRETRAINING FOR LOW-RESOURCE SPEECH-TO-TEXT TRANSLATION [J].
Stoian, Mihaela C. ;
Bansal, Sameer ;
Goldwater, Sharon .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :7909-7913
[45]   PHONEME LEVEL LANGUAGE MODELS FOR SEQUENCE BASED LOW RESOURCE ASR [J].
Dalmia, Siddharth ;
Li, Xinjian ;
Black, Alan W. ;
Metze, Florian .
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, :6091-6095
[46]   Improved Feature Fusion by Branched 1-D CNN for Speech Emotion Recognition [J].
Medha ;
Chhabra, Jitender Kumar ;
Kumar, Dinesh .
NEURAL INFORMATION PROCESSING, ICONIP 2022, PT VII, 2023, 1794 :175-186
[47]   Low bit rate compression methods of feature vectors for distributed speech recognition [J].
Enrique Garcia, Jose ;
Ortega, Alfonso ;
Miguel, Antonio ;
Lleida, Eduardo .
SPEECH COMMUNICATION, 2014, 58 :111-123
[48]   An Analysis of Emotional Speech Recognition for Tamil Language Using Deep Learning Gate Recurrent Unit [J].
Fernandes, Bennilo ;
Mannepalli, Kasiprasad .
PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2021, 29 (03) :1937-1961
[49]   Enhancing multilingual speech recognition in air traffic control by sentence-level language identification [J].
Fan, Peng ;
Guo, Dongyue ;
Zhang, Jianwei ;
Yang, Bo ;
Lin, Yi .
APPLIED ACOUSTICS, 2024, 224
[50]   EXPERIMENTAL STUDIES ON CONTINUOUS SPEECH RECOGNITION USING NEURAL ARCHITECTURES WITH "ADAPTIVE" HIDDEN ACTIVATION FUNCTIONS [J].
Siniscalchi, Sabato Marco ;
Svendsen, Torbjorn ;
Sorbello, Filippo ;
Lee, Chin-Hui .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4882-4885