Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech

被引:0
|
作者
Zhao, Robin [1 ]
Choi, Anna S. G. [2 ]
Koenecke, Allison [2 ]
Rameau, Anais [1 ]
机构
[1] Weill Cornell Med Coll, Sean Parker Inst Voice, New York, NY USA
[2] Cornell Univ, Dept Informat Sci, Ithaca, NY USA
关键词
artificial intelligence; voice; DEAF SPEECH; INTELLIGIBILITY; CHILDREN; PERCEPTION; SKILLS;
D O I
10.1002/lary.31713
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
ObjectiveTo evaluate the performance of commercial automatic speech recognition (ASR) systems on d/Deaf and hard-of-hearing (d/Dhh) speech.MethodsA corpus containing 850 audio files of d/Dhh and normal hearing (NH) speech from the University of Memphis Speech Perception Assessment Laboratory was tested on four speech-to-text application program interfaces (APIs): Amazon Web Services, Microsoft Azure, Google Chirp, and OpenAI Whisper. We quantified the Word Error Rate (WER) of API transcriptions for 24 d/Dhh and nine NH participants and performed subgroup analysis by speech intelligibility classification (SIC), hearing loss (HL) onset, and primary communication mode.ResultsMean WER averaged across APIs was 10 times higher for the d/Dhh group (52.6%) than the NH group (5.0%). APIs performed significantly worse for "low" and "medium" SIC (85.9% and 46.6% WER, respectively) as compared to "high" SIC group (9.5% WER, comparable to NH group). APIs performed significantly worse for speakers with prelingual HL relative to postlingual HL (80.5% and 37.1% WER, respectively). APIs performed significantly worse for speakers primarily communicating with sign language (70.2% WER) relative to speakers with both oral and sign language communication (51.5%) or oral communication only (19.7%).ConclusionCommercial ASR systems underperform for d/Dhh individuals, especially those with "low" and "medium" SIC, prelingual onset of HL, and sign language as primary communication mode. This contrasts with Big Tech companies' promises of accessibility, indicating the need for ASR systems ethically trained on heterogeneous d/Dhh speech data.Level of Evidence3 Laryngoscope, 2024 Commercial automatic speech recognition (ASR) systems underperform for d/Deaf and hard-of-hearing (d/Dhh) individuals, especially those with "low" and "medium" speech intelligibility classification, prelingual onset of hearing loss, and sign language as primary communication mode. There is a need for ASR systems ethically trained on heterogeneous d/Dhh speech data.image
引用
收藏
页码:191 / 197
页数:7
相关论文
共 50 条
  • [41] Spoken Language Development in Infants who are Deaf or Hard of Hearing: The Role of Maternal Infant-Directed Speech
    Bergeson-Dana, Tonya R.
    VOLTA REVIEW, 2012, 112 (02) : 171 - 180
  • [42] Automated Speech Production Assessment of Hard of Hearing Children
    Czap, Laszlo
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (02) : 380 - 389
  • [43] Speech recognition in echoic environments and the effect of aging and hearing impairment
    Ding, Nai
    Gao, Jiaxin
    Wang, Jing
    Sun, Wenhui
    Fang, Mingxuan
    Liu, Xiaoling
    Zhao, Hua
    HEARING RESEARCH, 2023, 431
  • [44] Matrix sentence intelligibility prediction using an automatic speech recognition system
    Schaedler, Marc Rene
    Warzybok, Anna
    Hochmuth, Sabine
    Kollmeier, Birger
    INTERNATIONAL JOURNAL OF AUDIOLOGY, 2015, 54 : 100 - 107
  • [45] Hearing Thresholds, Speech Recognition, and Audibility as Indicators for Modifying Intervention in Children With Hearing Aids
    Wiseman, Kathryn B.
    McCreery, Ryan W.
    Walker, Elizabeth A.
    EAR AND HEARING, 2023, 44 (04) : 787 - 802
  • [46] Masking Release for Speech-in-Speech Recognition Due to a Target/Masker Sex Mismatch in Children With Hearing Loss
    Leibold, Lori J.
    Browning, Jenna M.
    Buss, Emily
    EAR AND HEARING, 2020, 41 (02) : 259 - 267
  • [47] Assessing the accuracy of automatic speech recognition for psychotherapy
    Miner, Adam S.
    Haque, Albert
    Fries, Jason A.
    Fleming, Scott L.
    Wilfley, Denise E.
    Wilson, G. Terence
    Milstein, Arnold
    Jurafsky, Dan
    Arnow, Bruce A.
    Agras, W. Stewart
    Li Fei-Fei
    Shah, Nigam H.
    NPJ DIGITAL MEDICINE, 2020, 3 (01)
  • [48] GEOGRAPHIC LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
    Xiao, Xiaoqiang
    Chen, Hong
    Zylak, Mark
    Sosa, Daniela
    Desu, Suma
    Krishnamoorthy, Mahesh
    Liu, Daben
    Paulik, Matthias
    Zhang, Yuchen
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6124 - 6128
  • [49] Effects of Simulated and Profound Unilateral Sensorineural Hearing Loss on Recognition of Speech in Competing Speech
    Asp, Filip
    Reinfeldt, Sabine
    EAR AND HEARING, 2020, 41 (02) : 411 - 419
  • [50] Sindhi Speech Recognition System
    Khoso, Fida Hussain
    Hakro, Dil Nawaz
    Nasir, Syed Zafar
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2019, 19 (11): : 21 - 28