Counterfactually Fair Automatic Speech Recognition

被引:10
|
作者
Sari, Leda [1 ,2 ]
Hasegawa-Johnson, Mark [1 ]
Yoo, Chang D. [3 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
[2] Meta AI, Menlo Pk, CA 94025 USA
[3] Korea Adv Inst Sci & Technol, Daejeon 34141, South Korea
关键词
Training; Machine learning; Speech processing; Error analysis; Machine learning algorithms; Computational modeling; Transducers; Automatic speech recognition; speaker adaptation; fairness in machine learning; counterfactual fairness; BIAS;
D O I
10.1109/TASLP.2021.3126949
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Widelyused automatic speech recognition (ASR) systems have been empirically demonstrated in various studies to be unfair, having higher error rates for some groups of users than others. One way to define fairness in ASR is to require that changing the demographic group affiliation of any individual (e.g., changing their gender, age, education or race) should not change the probability distribution across possible speech-to-text transcriptions. In the paradigm of counterfactual fairness, all variables independent of group affiliation (e.g., the text being read by the speaker) remain unchanged, while variables dependent on group affiliation (e.g., the speaker's voice) are counterfactually modified. Hence, we approach the fairness of ASR by training the ASR to minimize change in its outcome probabilities despite a counterfactual change in the individual's demographic attributes. Starting from the individualized counterfactual equal odds criterion, we provide relaxations to it and compare their performances for connectionist temporal classification (CTC) based end-to-end ASR systems. We perform our experiments on the Corpus of Regional African American Language (CORAAL) and the LibriSpeech dataset to accommodate for differences due to gender, age, education, and race. We show that with counterfactual training, we can reduce average character error rates while achieving lower performance gap between demographic groups, and lower error standard deviation among individuals.
引用
收藏
页码:3515 / 3525
页数:11
相关论文
共 50 条
  • [31] Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding
    Lecouteux, Benjamin
    Linares, Georges
    Esteve, Yannick
    Gravier, Guillaume
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1251 - 1260
  • [32] The efficient incorporation of MLP features into automatic speech recognition systems
    Park, J.
    Diehl, F.
    Gales, M. J. F.
    Tomalin, M.
    Woodland, P. C.
    COMPUTER SPEECH AND LANGUAGE, 2011, 25 (03) : 519 - 534
  • [33] AUTOMATIC EVALUATION OF ENGLISH PRONUNCIATION BASED ON SPEECH RECOGNITION TECHNIQUES
    HAMADA, H
    MIKI, S
    NAKATSU, R
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (03) : 352 - 359
  • [34] Study of Deep Learning and CMU Sphinx in Automatic Speech Recognition
    Dhankar, Abhishek
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 2296 - 2301
  • [35] Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
    Moriya, Takafumi
    Sato, Hiroshi
    Ochiai, Tsubasa
    Delcroix, Marc
    Shinozaki, Takahiro
    IEEE ACCESS, 2023, 11 : 13906 - 13917
  • [36] Automatic Speech Correction: A step to Speech Recognition for People with Disabilities
    Terbeh, Naim
    Labidi, Mohamed
    Zrigui, Mounir
    2013 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY AND ACCESSIBILITY (ICTA), 2013,
  • [37] Real and synthetic Punjabi speech datasets for automatic speech recognition
    Singh, Satwinder
    Hou, Feng
    Wang, Ruili
    DATA IN BRIEF, 2024, 52
  • [38] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [39] RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition
    Georgescu, Alexandru-Lucian
    Cucu, Horia
    Buzo, Andi
    Burileanu, Corneliu
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6606 - 6612
  • [40] Bangladeshi Bangla speech corpus for automatic speech recognition research
    Kibria, Shafkat
    Samin, Ahnaf Mozib
    Kobir, M. Humayon
    Rahman, M. Shahidur
    Selim, M. Reza
    Iqbal, M. Zafar
    SPEECH COMMUNICATION, 2022, 136 : 84 - 97