Language Adaptive DNNs for Improved Low Resource Speech Recognition

被引:7
作者
Mueller, Markus [1 ]
Stueker, Sebastian [1 ]
Waibel, Alex [1 ]
机构
[1] Karlsruhe Inst Technol, Inst Anthropomat & Robot, Karlsruhe, Germany
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
Multilingual acoustic modelling; neural networks; low-resource ASR;
D O I
10.21437/Interspeech.2016-1143
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep Neural Network (DNN) acoustic models are commonly used in today's state-of-the-art speech recognition systems. As neural networks are a data driven method, the amount of available training data directly impacts the performance. In the past, several studies have shown that multilingual training of DNNs leads to improvements, especially in resource constrained tasks in which only limited training data in the target language is available. Previous studies have shown speaker adaptation to be successfully performed on DNNs. This is achieved by adding speaker information (e.g. i-Vectors) as additional input features. Based on the idea of adding additional features, we here present a method for adding language information to the input features of the network. Preliminary experiments have shown improvements by providing supervised information about language identity to the network. In this work, we extended this approach by training a neural network to encode language specific features. We extracted those features unsupervised and used them to provide additional cues to the DNN acoustic model during training. Our results show that augmenting acoustic input features with this language code enabled the network to better capture language specific peculiarities. This improved the performance of systems trained using data from multiple languages.
引用
收藏
页码:3878 / 3882
页数:5
相关论文
共 50 条
  • [31] Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
    Mukhamadiyev, Abdinabi
    Khujayarov, Ilyos
    Djuraev, Oybek
    Cho, Jinsoo
    [J]. SENSORS, 2022, 22 (10)
  • [32] Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
    Dhakal, Manish
    Chhetri, Arman
    Gupta, Aman Kumar
    Lamichhane, Prabin
    Pandey, Suraj
    Shakya, Subarna
    [J]. 2022 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES, ICICT 2022, 2022, : 515 - 521
  • [33] A DCRNN-based ensemble classifier for speech emotion recognition in Odia language
    Swain, Monorama
    Maji, Bubai
    Kabisatpathy, P.
    Routray, Aurobinda
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (05) : 4237 - 4249
  • [34] Self-Adaptive Distillation for Multilingual Speech Recognition: Leveraging Student Independence
    Leal, Isabel
    Gaur, Neeraj
    Haghani, Parisa
    Farris, Brian
    Moreno, Pedro J.
    Prasad, Manasa
    Ramabhadran, Bhuvana
    Zhu, Yun
    [J]. INTERSPEECH 2021, 2021, : 2556 - 2560
  • [35] AMGCN: An adaptive multi-graph convolutional network for speech emotion recognition
    Lian, Hailun
    Lu, Cheng
    Chang, Hongli
    Zhao, Yan
    Li, Sunan
    Li, Yang
    Zong, Yuan
    [J]. SPEECH COMMUNICATION, 2025, 168
  • [36] I-Vector Dependent Feature Space Transformations for Adaptive Speech Recognition
    Li, Xiangang
    Wu, Xihong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3635 - 3639
  • [37] BAYESIAN MODELS FOR UNIT DISCOVERY ON A VERY LOW RESOURCE LANGUAGE
    Ondel, Lucas
    Godard, Pierre
    Besacier, Laurent
    Larsen, Elin
    Hasegawa-Johnson, Mark
    Scharenborg, Odette
    Dupoux, Emmanuel
    Burget, Lukas
    Yvon, Francois
    Khudanpur, Sanjeev
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5939 - 5943
  • [38] Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the Wild
    Avila, Anderson R.
    Akhtar, Zahid
    Santos, Joao F.
    O'Shaughnessy, Douglas
    Falk, Tiago H.
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (01) : 177 - 188
  • [39] End-to-end speech recognition system based on improved CLDNN structure
    Feng, Yujie
    Zhang, Yi
    Xu, Xuan
    [J]. PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 538 - 542
  • [40] A study of neural network Russian language models for automatic continuous speech recognition systems
    Kipyatkova, I. S.
    Karpov, A. A.
    [J]. AUTOMATION AND REMOTE CONTROL, 2017, 78 (05) : 858 - 867