Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech

被引:0
|
作者
Zhao, Robin [1 ]
Choi, Anna S. G. [2 ]
Koenecke, Allison [2 ]
Rameau, Anais [1 ]
机构
[1] Weill Cornell Med Coll, Sean Parker Inst Voice, New York, NY USA
[2] Cornell Univ, Dept Informat Sci, Ithaca, NY USA
关键词
artificial intelligence; voice; DEAF SPEECH; INTELLIGIBILITY; CHILDREN; PERCEPTION; SKILLS;
D O I
10.1002/lary.31713
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
ObjectiveTo evaluate the performance of commercial automatic speech recognition (ASR) systems on d/Deaf and hard-of-hearing (d/Dhh) speech.MethodsA corpus containing 850 audio files of d/Dhh and normal hearing (NH) speech from the University of Memphis Speech Perception Assessment Laboratory was tested on four speech-to-text application program interfaces (APIs): Amazon Web Services, Microsoft Azure, Google Chirp, and OpenAI Whisper. We quantified the Word Error Rate (WER) of API transcriptions for 24 d/Dhh and nine NH participants and performed subgroup analysis by speech intelligibility classification (SIC), hearing loss (HL) onset, and primary communication mode.ResultsMean WER averaged across APIs was 10 times higher for the d/Dhh group (52.6%) than the NH group (5.0%). APIs performed significantly worse for "low" and "medium" SIC (85.9% and 46.6% WER, respectively) as compared to "high" SIC group (9.5% WER, comparable to NH group). APIs performed significantly worse for speakers with prelingual HL relative to postlingual HL (80.5% and 37.1% WER, respectively). APIs performed significantly worse for speakers primarily communicating with sign language (70.2% WER) relative to speakers with both oral and sign language communication (51.5%) or oral communication only (19.7%).ConclusionCommercial ASR systems underperform for d/Dhh individuals, especially those with "low" and "medium" SIC, prelingual onset of HL, and sign language as primary communication mode. This contrasts with Big Tech companies' promises of accessibility, indicating the need for ASR systems ethically trained on heterogeneous d/Dhh speech data.Level of Evidence3 Laryngoscope, 2024 Commercial automatic speech recognition (ASR) systems underperform for d/Deaf and hard-of-hearing (d/Dhh) individuals, especially those with "low" and "medium" speech intelligibility classification, prelingual onset of hearing loss, and sign language as primary communication mode. There is a need for ASR systems ethically trained on heterogeneous d/Dhh speech data.image
引用
收藏
页码:191 / 197
页数:7
相关论文
共 50 条
  • [31] Maxillectomy patients' speech and performance of contemporary speaker-independent automatic speech recognition platforms in Japanese
    Ali, Ahmed Sameir Mohamed
    Masaki, Keita
    Hattori, Mariko
    Sumita, Yuka I.
    Wakabayashi, Noriyuki
    JOURNAL OF ORAL REHABILITATION, 2024, 51 (11) : 2361 - 2367
  • [32] Automatic speech recognition in neurodegenerative disease
    Schultz, Benjamin G.
    Tarigoppula, Venkata S. Aditya
    Noffs, Gustavo
    Rojas, Sandra
    van der Walt, Anneke
    Grayden, David B.
    Vogel, Adam P.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 771 - 779
  • [33] Effect of simulated hearing loss on automatic speech recognition for an android robot-patient
    Roehl, Jan Hendrik
    Guenther, Ulf
    Hein, Andreas
    Cauchi, Benjamin
    FRONTIERS IN ROBOTICS AND AI, 2024, 11
  • [34] On the Role of Binary Mask Pattern in Automatic Speech Recognition
    Narayanan, Arun
    Wang, DeLiang
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1238 - 1241
  • [35] Effects of Adaptive Hearing Aid Directionality and Noise Reduction on Masked Speech Recognition for Children Who Are Hard of Hearing
    Browning, Jenna M.
    Buss, Emily
    Flaherty, Mary
    Vallier, Tim
    Leibold, Lori J.
    AMERICAN JOURNAL OF AUDIOLOGY, 2019, 28 (01) : 101 - 113
  • [36] The use of automatic speech recognition showing the influence of nasality on speech intelligibility
    Mayr, S.
    Burkhardt, K.
    Schuster, M.
    Rogler, K.
    Maier, A.
    Iro, H.
    EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2010, 267 (11) : 1719 - 1725
  • [37] Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech
    Mengistu, Kinfe Tadesse
    Rudzicz, Frank
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 291 - 300
  • [38] Evaluating Speech Intelligibility for Cochlear Implants Using Automatic Speech Recognition
    Zhou, Hengzhi
    Shi, Mingyue
    Meng, Qinglin
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 1 - 5
  • [39] Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech
    Song, Kaitao
    Wan, Teng
    Wang, Bixia
    Jiang, Huiqiang
    Qiu, Luna
    Xu, Jiahang
    Jiang, Liping
    Lou, Qun
    Yang, Yuqing
    Li, Dongsheng
    Wang, Xudong
    Qiu, Lili
    INTERSPEECH 2022, 2022, : 4820 - 4824
  • [40] Nazareth College: Specialty Preparation for Speech-Language Pathologists to Work With Children who are Deaf and Hard of Hearing
    Brown, Paula M.
    Quenin, Cathy
    VOLTA REVIEW, 2010, 110 (02) : 297 - 304