Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech

被引:0
|
作者
Zhao, Robin [1 ]
Choi, Anna S. G. [2 ]
Koenecke, Allison [2 ]
Rameau, Anais [1 ]
机构
[1] Weill Cornell Med Coll, Sean Parker Inst Voice, New York, NY USA
[2] Cornell Univ, Dept Informat Sci, Ithaca, NY USA
关键词
artificial intelligence; voice; DEAF SPEECH; INTELLIGIBILITY; CHILDREN; PERCEPTION; SKILLS;
D O I
10.1002/lary.31713
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
ObjectiveTo evaluate the performance of commercial automatic speech recognition (ASR) systems on d/Deaf and hard-of-hearing (d/Dhh) speech.MethodsA corpus containing 850 audio files of d/Dhh and normal hearing (NH) speech from the University of Memphis Speech Perception Assessment Laboratory was tested on four speech-to-text application program interfaces (APIs): Amazon Web Services, Microsoft Azure, Google Chirp, and OpenAI Whisper. We quantified the Word Error Rate (WER) of API transcriptions for 24 d/Dhh and nine NH participants and performed subgroup analysis by speech intelligibility classification (SIC), hearing loss (HL) onset, and primary communication mode.ResultsMean WER averaged across APIs was 10 times higher for the d/Dhh group (52.6%) than the NH group (5.0%). APIs performed significantly worse for "low" and "medium" SIC (85.9% and 46.6% WER, respectively) as compared to "high" SIC group (9.5% WER, comparable to NH group). APIs performed significantly worse for speakers with prelingual HL relative to postlingual HL (80.5% and 37.1% WER, respectively). APIs performed significantly worse for speakers primarily communicating with sign language (70.2% WER) relative to speakers with both oral and sign language communication (51.5%) or oral communication only (19.7%).ConclusionCommercial ASR systems underperform for d/Dhh individuals, especially those with "low" and "medium" SIC, prelingual onset of HL, and sign language as primary communication mode. This contrasts with Big Tech companies' promises of accessibility, indicating the need for ASR systems ethically trained on heterogeneous d/Dhh speech data.Level of Evidence3 Laryngoscope, 2024 Commercial automatic speech recognition (ASR) systems underperform for d/Deaf and hard-of-hearing (d/Dhh) individuals, especially those with "low" and "medium" speech intelligibility classification, prelingual onset of hearing loss, and sign language as primary communication mode. There is a need for ASR systems ethically trained on heterogeneous d/Dhh speech data.image
引用
收藏
页码:191 / 197
页数:7
相关论文
共 50 条
  • [21] Self-conducted speech audiometry using automatic speech recognition: Simulation results for listeners with hearing loss
    Ooster, Jasper
    Tuschen, Laura
    Meyer, Bernd T.
    COMPUTER SPEECH AND LANGUAGE, 2023, 78
  • [22] Quantifying and Improving the Performance of Speech Recognition Systems on Dysphonic Speech
    Lopez, Julio C. Hidalgo C.
    Sandeep, Shelly
    Wright, MaKayla
    Wandell, Grace M. M.
    Law, Anthony B. B.
    OTOLARYNGOLOGY-HEAD AND NECK SURGERY, 2023, 168 (05) : 1130 - 1138
  • [23] A 2-STEP SEGMENTATION METHOD FOR AUTOMATIC RECOGNITION OF SPEECH OF PERSONS WHO ARE DEAF
    ABDELHAMIED, KA
    WALDRON, MB
    FOX, RA
    JOURNAL OF REHABILITATION RESEARCH AND DEVELOPMENT, 1992, 29 (03): : 45 - 56
  • [24] Using Automatic Speech Recognition to Optimize Hearing-Aid Time Constants
    Fontan, Lionel
    Goncalves Braz, Libio
    Pinquier, Julien
    Stone, Michael A.
    Fullgrabe, Christian
    FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [25] Effects of Age and Hearing Loss on the Recognition of Emotions in Speech
    Christensen, Julie A.
    Sis, Jenni
    Kulkarni, Aditya M.
    Chatterjee, Monita
    EAR AND HEARING, 2019, 40 (05) : 1069 - 1083
  • [26] Speech Technology for Automatic Recognition and Assessment of Dysarthric Speech: An Overview
    Bhat, Chitralekha
    Strik, Helmer
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2025, 68 (02): : 547 - 577
  • [27] Spoken Word Recognition Errors in Speech Audiometry: A Measure of Hearing Performance?
    Coene, Martine
    van der Lee, Anneke
    Govaerts, Paul J.
    BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [28] Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer
    Maier, Andreas
    Haderlein, Tino
    Stelzle, Florian
    Noeth, Elmar
    Nkenke, Emeka
    Rosanowski, Frank
    Schuetzenberger, Anne
    Schuster, Maria
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2010,
  • [29] Low-resource automatic speech recognition and error analyses of oral cancer speech
    Halpern, Bence Mark
    Feng, Siyuan
    van Son, Rob
    van den Brekel, Michiel
    Scharenborg, Odette
    SPEECH COMMUNICATION, 2022, 141 : 14 - 27
  • [30] Application of Automatic Speech Recognition to Quantitative Assessment of Tracheoesophageal Speech with Different Signal Quality
    Haderlein, Tino
    Riedhammer, Korbinian
    Noeth, Elmar
    Toy, Hikmet
    Schuster, Maria
    Eysholdt, Ulrich
    Hornegger, Joachim
    Rosanowski, Frank
    FOLIA PHONIATRICA ET LOGOPAEDICA, 2009, 61 (01) : 12 - 17