Counterfactually Fair Automatic Speech Recognition

被引:10
|
作者
Sari, Leda [1 ,2 ]
Hasegawa-Johnson, Mark [1 ]
Yoo, Chang D. [3 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
[2] Meta AI, Menlo Pk, CA 94025 USA
[3] Korea Adv Inst Sci & Technol, Daejeon 34141, South Korea
关键词
Training; Machine learning; Speech processing; Error analysis; Machine learning algorithms; Computational modeling; Transducers; Automatic speech recognition; speaker adaptation; fairness in machine learning; counterfactual fairness; BIAS;
D O I
10.1109/TASLP.2021.3126949
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Widelyused automatic speech recognition (ASR) systems have been empirically demonstrated in various studies to be unfair, having higher error rates for some groups of users than others. One way to define fairness in ASR is to require that changing the demographic group affiliation of any individual (e.g., changing their gender, age, education or race) should not change the probability distribution across possible speech-to-text transcriptions. In the paradigm of counterfactual fairness, all variables independent of group affiliation (e.g., the text being read by the speaker) remain unchanged, while variables dependent on group affiliation (e.g., the speaker's voice) are counterfactually modified. Hence, we approach the fairness of ASR by training the ASR to minimize change in its outcome probabilities despite a counterfactual change in the individual's demographic attributes. Starting from the individualized counterfactual equal odds criterion, we provide relaxations to it and compare their performances for connectionist temporal classification (CTC) based end-to-end ASR systems. We perform our experiments on the Corpus of Regional African American Language (CORAAL) and the LibriSpeech dataset to accommodate for differences due to gender, age, education, and race. We show that with counterfactual training, we can reduce average character error rates while achieving lower performance gap between demographic groups, and lower error standard deviation among individuals.
引用
收藏
页码:3515 / 3525
页数:11
相关论文
共 50 条
  • [1] A Survey of Automatic Speech Recognition for Dysarthric Speech
    Qian, Zhaopeng
    Xiao, Kejing
    ELECTRONICS, 2023, 12 (20)
  • [2] Acoustic Analysis for Automatic Speech Recognition
    O'Shaughnessy, Douglas
    PROCEEDINGS OF THE IEEE, 2013, 101 (05) : 1038 - 1053
  • [3] Counterfactually Fair Prediction Using Multiple Causal Models
    Zennaro, Fabio Massimo
    Ivanovska, Magdalena
    MULTI-AGENT SYSTEMS, EUMAS 2018, 2019, 11450 : 249 - 266
  • [4] Gender domain adaptation for automatic speech recognition
    Sokolov, Artem
    Savchenko, Anclrey V.
    2021 IEEE 19TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2021), 2021, : 413 - 417
  • [5] Machine Learning in Automatic Speech Recognition: A Survey
    Padmanabhan, Jayashree
    Premkumar, Melvin Jose Johnson
    IETE TECHNICAL REVIEW, 2015, 32 (04) : 240 - 251
  • [6] Automatic speech recognition: a survey
    Mishaim Malik
    Muhammad Kamran Malik
    Khawar Mehmood
    Imran Makhdoom
    Multimedia Tools and Applications, 2021, 80 : 9411 - 9457
  • [7] Efficient automatic speech recognition
    O'Shaughnessy, D
    PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON INTERNET AND MULTIMEDIA SYSTEMS AND APPLICATIONS, 2004, : 323 - 327
  • [8] NETWORKS FOR SPEECH ENHANCEMENT AND AUTOMATIC SPEECH RECOGNITION
    Vu, Thanh T.
    Bigot, Benjamin
    Chng, Eng Siong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 499 - 503
  • [9] Automatic speech recognition: a survey
    Malik, Mishaim
    Malik, Muhammad Kamran
    Mehmood, Khawar
    Makhdoom, Imran
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 9411 - 9457
  • [10] Targeted Universal Adversarial Perturbations for Automatic Speech Recognition
    Zong, Wei
    Chow, Yang-Wai
    Susilo, Willy
    Rana, Santu
    Venkatesh, Svetha
    INFORMATION SECURITY (ISC 2021), 2021, 13118 : 358 - 373