Language Adaptive DNNs for Improved Low Resource Speech Recognition

被引：7

作者：

Mueller, Markus ^{[1
]}

Stueker, Sebastian ^{[1
]}

Waibel, Alex ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Inst Anthropomat & Robot, Karlsruhe, Germany

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

Multilingual acoustic modelling; neural networks; low-resource ASR;

D O I：

10.21437/Interspeech.2016-1143

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep Neural Network (DNN) acoustic models are commonly used in today's state-of-the-art speech recognition systems. As neural networks are a data driven method, the amount of available training data directly impacts the performance. In the past, several studies have shown that multilingual training of DNNs leads to improvements, especially in resource constrained tasks in which only limited training data in the target language is available. Previous studies have shown speaker adaptation to be successfully performed on DNNs. This is achieved by adding speaker information (e.g. i-Vectors) as additional input features. Based on the idea of adding additional features, we here present a method for adding language information to the input features of the network. Preliminary experiments have shown improvements by providing supervised information about language identity to the network. In this work, we extended this approach by training a neural network to encode language specific features. We extracted those features unsupervised and used them to provide additional cues to the DNN acoustic model during training. Our results show that augmenting acoustic input features with this language code enabled the network to better capture language specific peculiarities. This improved the performance of systems trained using data from multiple languages.

引用

页码：3878 / 3882

页数：5

共 50 条

[21] A Language Model Optimization Method for Turkish Automatic Speech Recognition System [J].

Oyucu, Saadin ;

Polat, Huseyin .

JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (03) :1167-1178

[22] A survey on automatic speech recognition systems for Portuguese language and its variations [J].

de Lima, Thales Aguiar ;

Da Costa-Abreu, Marjory .

COMPUTER SPEECH AND LANGUAGE, 2020, 62 (62)

[23] Application of virtual human sign language translation based on speech recognition [J].

Li, Xin ;

Yang, Shuying ;

Guo, Haiming .

SPEECH COMMUNICATION, 2023, 152

[24] Noise Robust End-to-End Speech Recognition For Bangla Language [J].

Sumit, Sakhawat Hosain ;

Al Muntasir, Tareq ;

Zaman, M. M. Arefin ;

Nandi, Rabindra Nath ;

Sourov, Tanvir .

2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,

[25] Empirical study of neural network language models for Arabic speech recognition [J].

Emami, Ahmad ;

Mangu, Lidia .

2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :147-152

[26] AdaStreamLite: Environment-adaptive Streaming Speech Recognition on Mobile Devices [J].

Wei, Yuheng ;

Xiong, Jie ;

Liu, Hui ;

Yu, Yingtao ;

Pan, Jiangtao ;

Du, Junzhao .

PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2023, 7 (04)

[27] Model-based Articulatory Phonetic Features for Improved Speech Recognition [J].

Huang, Guangpu ;

Er, Meng Joo .

2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,

[28] A Curriculum Learning Method for Improved Noise Robustness in Automatic Speech Recognition [J].

Braun, Stefan ;

Neil, Daniel ;

Liu, Shih-Chii .

2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, :548-552

[29] Multi-cultural speech emotion recognition using language and speaker cues [J].

Pandey, Sandeep Kumar ;

Shekhawat, Hanumant Singh ;

Prasanna, S. R. M. .

BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 83

[30] Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition [J].

Tueske, Zoltan ;

Schlueter, Ralf ;

Ney, Hermann .

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :3358-3362

← 1 2 3 4 5 →