Language Adaptive DNNs for Improved Low Resource Speech Recognition

被引：7

作者：

Mueller, Markus ^{[1
]}

Stueker, Sebastian ^{[1
]}

Waibel, Alex ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Inst Anthropomat & Robot, Karlsruhe, Germany

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

Multilingual acoustic modelling; neural networks; low-resource ASR;

D O I：

10.21437/Interspeech.2016-1143

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep Neural Network (DNN) acoustic models are commonly used in today's state-of-the-art speech recognition systems. As neural networks are a data driven method, the amount of available training data directly impacts the performance. In the past, several studies have shown that multilingual training of DNNs leads to improvements, especially in resource constrained tasks in which only limited training data in the target language is available. Previous studies have shown speaker adaptation to be successfully performed on DNNs. This is achieved by adding speaker information (e.g. i-Vectors) as additional input features. Based on the idea of adding additional features, we here present a method for adding language information to the input features of the network. Preliminary experiments have shown improvements by providing supervised information about language identity to the network. In this work, we extended this approach by training a neural network to encode language specific features. We extracted those features unsupervised and used them to provide additional cues to the DNN acoustic model during training. Our results show that augmenting acoustic input features with this language code enabled the network to better capture language specific peculiarities. This improved the performance of systems trained using data from multiple languages.

引用

页码：3878 / 3882

页数：5

共 50 条

[31] Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
Mukhamadiyev, Abdinabi
Khujayarov, Ilyos
Djuraev, Oybek
Cho, Jinsoo
[J]. SENSORS, 2022, 22 (10)
[32] Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
Dhakal, Manish
Chhetri, Arman
Gupta, Aman Kumar
Lamichhane, Prabin
Pandey, Suraj
Shakya, Subarna
[J]. 2022 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES, ICICT 2022, 2022, : 515 - 521
[33] A DCRNN-based ensemble classifier for speech emotion recognition in Odia language
Swain, Monorama
Maji, Bubai
Kabisatpathy, P.
Routray, Aurobinda
[J]. COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (05) : 4237 - 4249
[34] Self-Adaptive Distillation for Multilingual Speech Recognition: Leveraging Student Independence
Leal, Isabel
Gaur, Neeraj
Haghani, Parisa
Farris, Brian
Moreno, Pedro J.
Prasad, Manasa
Ramabhadran, Bhuvana
Zhu, Yun
[J]. INTERSPEECH 2021, 2021, : 2556 - 2560
[35] AMGCN: An adaptive multi-graph convolutional network for speech emotion recognition
Lian, Hailun
Lu, Cheng
Chang, Hongli
Zhao, Yan
Li, Sunan
Li, Yang
Zong, Yuan
[J]. SPEECH COMMUNICATION, 2025, 168
[36] I-Vector Dependent Feature Space Transformations for Adaptive Speech Recognition
Li, Xiangang
Wu, Xihong
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3635 - 3639
[37] BAYESIAN MODELS FOR UNIT DISCOVERY ON A VERY LOW RESOURCE LANGUAGE
Ondel, Lucas
Godard, Pierre
Besacier, Laurent
Larsen, Elin
Hasegawa-Johnson, Mark
Scharenborg, Odette
Dupoux, Emmanuel
Burget, Lukas
Yvon, Francois
Khudanpur, Sanjeev
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5939 - 5943
[38] Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the Wild
Avila, Anderson R.
Akhtar, Zahid
Santos, Joao F.
O'Shaughnessy, Douglas
Falk, Tiago H.
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (01) : 177 - 188
[39] End-to-end speech recognition system based on improved CLDNN structure
Feng, Yujie
Zhang, Yi
Xu, Xuan
[J]. PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 538 - 542
[40] A study of neural network Russian language models for automatic continuous speech recognition systems
Kipyatkova, I. S.
Karpov, A. A.
[J]. AUTOMATION AND REMOTE CONTROL, 2017, 78 (05) : 858 - 867

← 1 2 3 4 5 →