A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models

被引:0
|
作者
Li, Yan [1 ]
Wang, Yapeng [1 ]
Hoi, Lap Man [1 ]
Yang, Dingcheng [3 ]
Im, Sio-Kei [2 ]
机构
[1] Macao Polytech Univ, Fac Appl Sci, Macau, Peoples R China
[2] Macao Polytech Univ, Macau, Peoples R China
[3] Nanchang Univ, Sch Informat Engn, Nanchang, Peoples R China
来源
关键词
Portuguese speech recognition; Review; End-to-end models;
D O I
10.1186/s13636-024-00388-w
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data resources have constrained the development and application of Portuguese speech recognition systems. The neglect of accent issues is also detrimental to the promotion of recognition systems. This study focuses on the research progress of end-to-end technology on Portuguese speech recognition task. It discusses relevant research from two directions: Brazilian Portuguese recognition and European Portuguese recognition, and organizes available corpus resources for potential researchers. Then, taking European Portuguese speech recognition as an example, it takes the Fairseq-S2T and Whisper as benchmarks tested on a 500-h European Portuguese dataset to estimate the performance of large-scale pre-trained models and fine-tuning techniques. Whisper obtained a WER of 5.11% which indicates that multilingual joint training can enhance the generalization ability. Finally, to the existing problems in Portuguese speech recognition, it explores future research directions, which provides new ideas for the next stage of research and system construction.
引用
收藏
页数:13
相关论文
共 24 条
  • [1] On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
    Li, Jinyu
    Wu, Yu
    Gaur, Yashesh
    Wang, Chengyi
    Zhao, Rui
    Liu, Shujie
    INTERSPEECH 2020, 2020, : 1 - 5
  • [2] FINE-TUNING OF PRE-TRAINED END-TO-END SPEECH RECOGNITION WITH GENERATIVE ADVERSARIAL NETWORKS
    Haidar, Md Akmal
    Rezagholizadeh, Mehdi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6204 - 6208
  • [3] Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model
    Kannan, Anjuli
    Datta, Arindrima
    Sainath, Tara N.
    Weinstein, Eugene
    Ramabhadran, Bhuvana
    Wu, Yonghui
    Bapna, Ankur
    Chen, Zhifeng
    Lee, Seungji
    INTERSPEECH 2019, 2019, : 2130 - 2134
  • [4] Introduction To Partial Fine-tuning: A Comprehensive Evaluation Of End-to-end Children's Automatic Speech Recognition Adaptation
    Rolland, Thomas
    Abad, Alberto
    INTERSPEECH 2024, 2024, : 5178 - 5182
  • [5] A large-scale dataset for end-to-end table recognition in the wild
    Fan Yang
    Lei Hu
    Xinwu Liu
    Shuangping Huang
    Zhenghui Gu
    Scientific Data, 10
  • [6] A large-scale dataset for end-to-end table recognition in the wild
    Yang, Fan
    Hu, Lei
    Liu, Xinwu
    Huang, Shuangping
    Gu, Zhenghui
    SCIENTIFIC DATA, 2023, 10 (01)
  • [7] Fine-Tuning Self-Supervised Learning Models for End-to-End Pronunciation Scoring
    Zahran, Ahmed I.
    Fahmy, Aly A.
    Wassif, Khaled T.
    Bayomi, Hanaa
    IEEE ACCESS, 2023, 11 : 112650 - 112663
  • [8] Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
    Wang, Chengyi
    Wu, Yu
    Liu, Shujie
    Yang, Zhenglu
    Zhou, Ming
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9161 - 9168
  • [9] KNOWLEDGE TRANSFER FROM LARGE-SCALE PRETRAINED LANGUAGE MODELS TO END-TO-END SPEECH RECOGNIZERS
    Kubo, Yotaro
    Karita, Shigeki
    Bacchiani, Michiel
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8512 - 8516
  • [10] SCALING END-TO-END MODELS FOR LARGE-SCALE MULTILINGUAL ASR
    Li, Bo
    Pang, Ruoming
    Sainath, Tara N.
    Gulati, Anmol
    Zhang, Yu
    Qin, James
    Haghani, Parisa
    Huang, W. Ronny
    Ma, Min
    Bai, Junwen
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1011 - 1018