Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning

被引：2

作者：

Medeiros, Eduardo ^{[1
]}

Corado, Leonel ^{[1
]}

Rato, Luis ^{[1
,2
]}

Quaresma, Paulo ^{[1
,2
]}

Salgueiro, Pedro ^{[1
,2
]}

机构：

[1] Univ Evora, Escola Ciencias & Tecnol, P-7000671 Evora, Portugal

[2] Univ Evora, Ctr ALGORITMI, Vista Lab, P-7000671 Evora, Portugal

来源：

FUTURE INTERNET | 2023年 / 15卷 / 05期

关键词：

machine learning; deep learning; deep neural networks; speech-to-text; automatic speech recognition; NVIDIA NeMo; GPUs; data-centric; Portuguese language; RECOGNITION;

D O I：

10.3390/fi15050159

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic speech recognition (ASR), commonly known as speech-to-text, is the process of transcribing audio recordings into text, i.e., transforming speech into the respective sequence of words. This paper presents a deep learning ASR system optimization and evaluation for the European Portuguese language. We present a pipeline composed of several stages for data acquisition, analysis, pre-processing, model creation, and evaluation. A transfer learning approach is proposed considering an English language-optimized model as starting point; a target composed of European Portuguese; and the contribution to the transfer process by a source from a different domain consisting of a multiple-variant Portuguese language dataset, essentially composed of Brazilian Portuguese. A domain adaptation was investigated between European Portuguese and mixed (mostly Brazilian) Portuguese. The proposed optimization evaluation used the NVIDIA NeMo framework implementing the QuartzNet15x5 architecture based on 1D time-channel separable convolutions. Following this transfer learning data-centric approach, the model was optimized, achieving a state-of-the-art word error rate (WER) of 0.0503.

引用

页数：16

共 34 条

[1]

[Anonymous], 2015, P INT C MACH LEARN J

[2]

[Anonymous], 1988, G711 INT TEL UN

[3] Applying transfer learning and various ANN architectures to predict transportation mode choice in Amsterdam [J].

Buijs, Ruurd ;

Koch, Thomas ;

Dugundji, Elenna .

12TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT) / THE 4TH INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40) / AFFILIATED WORKSHOPS, 2021, 184 :532-540

[4]

Cho J, 2018, IEEE W SP LANG TECH, P521, DOI 10.1109/SLT.2018.8639655

[5]

Dalmia S, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4909, DOI 10.1109/ICASSP.2018.8461802

[6] A survey on automatic speech recognition systems for Portuguese language and its variations [J].

de Lima, Thales Aguiar ;

Da Costa-Abreu, Marjory .

COMPUTER SPEECH AND LANGUAGE, 2020, 62

[7]

Dimitriadis D., 2017, arXiv, DOI DOI 10.48550/ARXIV.1703.02136

[8]

Eberhard David M., 2023, Ethnologue: Languages of the World

[9]

Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1

[10]

Graves A, 2014, PR MACH LEARN RES, V32, P1764

← 1 2 3 4 →