Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning

被引:0
作者
Brydinskyi, Vitalii [1 ]
Sabodashko, Dmytro [1 ]
Khoma, Yuriy [1 ]
Podpora, Michal [2 ]
Konovalov, Alexander [3 ]
Khoma, Volodymyr [4 ]
机构
[1] Lviv Polytech Natl Univ, Inst Comp Technol Automat & Metrol, UA-79013 Lvov, Ukraine
[2] Opole Univ Technol, Dept Comp Sci, PL-45758 Opole, Poland
[3] Vidby AG, CH-6343 Risch Rotkreuz, Switzerland
[4] Opole Univ Technol, Dept Control Engn, PL-45758 Opole, Poland
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Automatic speech recognition; Transformers; Natural language processing; speech processing; natural language processing; sound recognition;
D O I
10.1109/ACCESS.2024.3443811
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR) systems have become increasingly popular in recent years due to their ability to convert spoken language into text. Nonetheless, despite their widespread use, existing speaker-independent ASR systems frequently encounter challenges related to variations in speaking styles, accents, and vocal characteristics, leading to potential recognition inaccuracies. This study delves into the feasibility of personalized ASR systems that adapt to the unique voice attributes of individual speakers, thereby enhancing recognition accuracy. It provides an overview of our methodology, focusing on the design, development, and evaluation of both speaker-independent and personalized ASR systems. The dataset used included diverse speakers selected from three extensive datasets: TedLIUM-3, CommonVoice, and GoogleVoice, demonstrating the capability of our methodology to accommodate various accents and challenges of both natural and synthetic voices. In terms of signal classification and interpretation, the personalized model eclipsed the speaker-independent variant, registering an enhancement of up to similar to 3% for natural voices and similar to 10% for synthetic voices in recognition accuracy for individual speakers. Our findings demonstrate that personalized ASR systems can significantly improve the accuracy of speech recognition for individual speakers and highlight the importance of adapting ASR models to individual voices.
引用
收藏
页码:116649 / 116656
页数:8
相关论文
共 24 条
  • [21] Implementation of a Whisper Architecture-Based Turkish Automatic Speech Recognition (ASR) System and Evaluation of the Effect of Fine-Tuning with a Low-Rank Adaptation (LoRA) Adapter on Its Performance
    Polat, Hueseyin
    Turan, Alp Kaan
    Kocak, Cemal
    Ulas, Hasan Basri
    ELECTRONICS, 2024, 13 (21)
  • [22] Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models
    Zhao, Jing
    Zhang, Wei-Qiang
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1227 - 1241
  • [23] Enhancing Air Traffic Control Communication Systems with Integrated Automatic Speech Recognition: Models, Applications and Performance Evaluation
    Wang, Zhuang
    Jiang, Peiyuan
    Wang, Zixuan
    Han, Boyuan
    Liang, Haijun
    Ai, Yi
    Pan, Weijun
    SENSORS, 2024, 24 (14)
  • [24] I Can Speak: improving English pronunciation through automatic speech recognition-based language learning systems
    Bashori, Muzakki
    van Hout, Roeland
    Strik, Helmer
    Cucchiarini, Catia
    INNOVATION IN LANGUAGE LEARNING AND TEACHING, 2024, 18 (05) : 443 - 461