A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models

被引：0

作者：

Li, Yan ^{[1
]}

Wang, Yapeng ^{[1
]}

Hoi, Lap Man ^{[1
]}

Yang, Dingcheng ^{[3
]}

Im, Sio-Kei ^{[2
]}

机构：

[1] Macao Polytech Univ, Fac Appl Sci, Macau, Peoples R China

[2] Macao Polytech Univ, Macau, Peoples R China

[3] Nanchang Univ, Sch Informat Engn, Nanchang, Peoples R China

来源：

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2025年 / 2025卷 / 01期

关键词：

Portuguese speech recognition; Review; End-to-end models;

D O I：

10.1186/s13636-024-00388-w

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data resources have constrained the development and application of Portuguese speech recognition systems. The neglect of accent issues is also detrimental to the promotion of recognition systems. This study focuses on the research progress of end-to-end technology on Portuguese speech recognition task. It discusses relevant research from two directions: Brazilian Portuguese recognition and European Portuguese recognition, and organizes available corpus resources for potential researchers. Then, taking European Portuguese speech recognition as an example, it takes the Fairseq-S2T and Whisper as benchmarks tested on a 500-h European Portuguese dataset to estimate the performance of large-scale pre-trained models and fine-tuning techniques. Whisper obtained a WER of 5.11% which indicates that multilingual joint training can enhance the generalization ability. Finally, to the existing problems in Portuguese speech recognition, it explores future research directions, which provides new ideas for the next stage of research and system construction.

引用

页数：13

共 59 条

[11] Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[12] Chang KW, 2023, Arxiv, DOI arXiv:2310.02971
[13] Chen C, 2023, Arxiv, DOI arXiv:2309.15701
[14] Chiu CC, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4774, DOI 10.1109/ICASSP.2018.8462105
[15] FLEURS: FEW-SHOT LEARNING EVALUATION OF UNIVERSAL REPRESENTATIONS OF SPEECH
Conneau, Alexis
Ma, Min
Khanuja, Simran
Zhang, Yu
Axelrod, Vera
Dalmia, Siddharth
Riesa, Jason
Rivera, Clara
Bapna, Ankur
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 798 - 805
[16] A survey on automatic speech recognition systems for Portuguese language and its variations
de Lima, Thales Aguiar
Da Costa-Abreu, Marjory
[J]. COMPUTER SPEECH AND LANGUAGE, 2020, 62 (62)
[17] Web System Prototype based on speech recognition to construct medical reports in Brazilian Portuguese
de Toledo, Thiago Ferreira
Lee, Huei Diana
Spolaor, Newton
Rodrigues Coy, Claudio Saddy
Wu, Feng Chung
[J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 121 : 39 - 52
[18] Du JY, 2018, Arxiv, DOI arXiv:1808.10583
[19] Performance analysis of ASR system in hybrid DNN-HMM framework using a PWL euclidean activation function
Dutta, Anirban
Ashishkumar, Gudmalwar
Rao, Ch V. Rama
[J]. FRONTIERS OF COMPUTER SCIENCE, 2021, 15 (04)
[20] Fangyuan Z., 2021, Comput. Sci. Explor., V15, P2241

← 1 2 3 4 5 6 →