Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning

被引:0
作者
Brydinskyi, Vitalii [1 ]
Sabodashko, Dmytro [1 ]
Khoma, Yuriy [1 ]
Podpora, Michal [2 ]
Konovalov, Alexander [3 ]
Khoma, Volodymyr [4 ]
机构
[1] Lviv Polytech Natl Univ, Inst Comp Technol Automat & Metrol, UA-79013 Lvov, Ukraine
[2] Opole Univ Technol, Dept Comp Sci, PL-45758 Opole, Poland
[3] Vidby AG, CH-6343 Risch Rotkreuz, Switzerland
[4] Opole Univ Technol, Dept Control Engn, PL-45758 Opole, Poland
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Automatic speech recognition; Transformers; Natural language processing; speech processing; natural language processing; sound recognition;
D O I
10.1109/ACCESS.2024.3443811
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR) systems have become increasingly popular in recent years due to their ability to convert spoken language into text. Nonetheless, despite their widespread use, existing speaker-independent ASR systems frequently encounter challenges related to variations in speaking styles, accents, and vocal characteristics, leading to potential recognition inaccuracies. This study delves into the feasibility of personalized ASR systems that adapt to the unique voice attributes of individual speakers, thereby enhancing recognition accuracy. It provides an overview of our methodology, focusing on the design, development, and evaluation of both speaker-independent and personalized ASR systems. The dataset used included diverse speakers selected from three extensive datasets: TedLIUM-3, CommonVoice, and GoogleVoice, demonstrating the capability of our methodology to accommodate various accents and challenges of both natural and synthetic voices. In terms of signal classification and interpretation, the personalized model eclipsed the speaker-independent variant, registering an enhancement of up to similar to 3% for natural voices and similar to 10% for synthetic voices in recognition accuracy for individual speakers. Our findings demonstrate that personalized ASR systems can significantly improve the accuracy of speech recognition for individual speakers and highlight the importance of adapting ASR models to individual voices.
引用
收藏
页码:116649 / 116656
页数:8
相关论文
共 24 条
  • [1] Fine-tuning your answers: a bag of tricks for improving VQA models
    Roberto Arroyo
    Sergio Álvarez
    Aitor Aller
    Luis M. Bergasa
    Miguel E. Ortiz
    Multimedia Tools and Applications, 2022, 81 : 26889 - 26913
  • [2] Enhancing Chinese Essay Discourse Logic Evaluation Through Optimized Fine-Tuning of Large Language Models
    Song, Jinwang
    Song, Yanxin
    Zhou, Guangyu
    Fu, Wenhui
    Zhang, Kunli
    Zan, Hongying
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT V, NLPCC 2024, 2025, 15363 : 342 - 352
  • [3] Fine-tuning your answers: a bag of tricks for improving VQA models
    Arroyo, Roberto
    Alvarez, Sergio
    Aller, Aitor
    Bergasa, Luis M.
    Ortiz, Miguel E.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (19) : 26889 - 26913
  • [4] Self-Supervised Fine-Tuning of Automatic Speech Recognition Systems against Signal Processing Attacks
    Jayawardena, Oshan
    Caldera, Dilmi
    Jayawardena, Sandani
    Sandeepa, Avishka
    Bindschaedler, Vincent
    Charles, Subodha
    PROCEEDINGS OF THE 19TH ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, ACM ASIACCS 2024, 2024, : 1272 - 1286
  • [5] Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
    Siriwardhana, Shamane
    Reis, Andrew
    Weerasekera, Rivindu
    Nanayakkara, Suranga
    INTERSPEECH 2020, 2020, : 3755 - 3759
  • [6] Improving Speech Recognition through Automatic Selection of Age Group - Specific Acoustic Models
    Haemaelaeinen, Annika
    Meinedo, Hugo
    Tjalve, Michael
    Pellegrini, Thomas
    Trancoso, Isabel
    Dias, Miguel Sales
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 12 - 23
  • [7] Enhancing Automatic Speech Recognition: Effects of Semantic Audio Filtering on Models Performance
    Perezhohin, Yuriy
    Santos, Tiago
    Costa, Victor
    Peres, Fernando
    Castelli, Mauro
    IEEE ACCESS, 2024, 12 : 155136 - 155150
  • [8] Replay to Remember: Continual Layer-Specific Fine-Tuning for German Speech Recognition
    Rosin, Theresa Pekarek
    Wermter, Stefan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 489 - 500
  • [9] ScoutWav: Two-Step Fine-Tuning on Self-Supervised Automatic Speech Recognition for Low-Resource Environments
    Fatehi, Kavan
    Torres, Mercedes Torres
    Kucukyilmaz, Ayse
    INTERSPEECH 2022, 2022, : 3523 - 3527
  • [10] "(sic)Te vienes? Sure!" Joint Fine-tuning of Language Detection and Transcription Improves Automatic Recognition of Code-Switching Speech
    Hillah, Leopold
    Dubiel, Mateusz
    Leiva, Luis A.
    PROCEEDINGS OF THE 6TH CONFERENCE ON ACM CONVERSATIONAL USER INTERFACES, CUI 2024, 2024,