Counterfactually Fair Automatic Speech Recognition

被引:10
|
作者
Sari, Leda [1 ,2 ]
Hasegawa-Johnson, Mark [1 ]
Yoo, Chang D. [3 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
[2] Meta AI, Menlo Pk, CA 94025 USA
[3] Korea Adv Inst Sci & Technol, Daejeon 34141, South Korea
关键词
Training; Machine learning; Speech processing; Error analysis; Machine learning algorithms; Computational modeling; Transducers; Automatic speech recognition; speaker adaptation; fairness in machine learning; counterfactual fairness; BIAS;
D O I
10.1109/TASLP.2021.3126949
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Widelyused automatic speech recognition (ASR) systems have been empirically demonstrated in various studies to be unfair, having higher error rates for some groups of users than others. One way to define fairness in ASR is to require that changing the demographic group affiliation of any individual (e.g., changing their gender, age, education or race) should not change the probability distribution across possible speech-to-text transcriptions. In the paradigm of counterfactual fairness, all variables independent of group affiliation (e.g., the text being read by the speaker) remain unchanged, while variables dependent on group affiliation (e.g., the speaker's voice) are counterfactually modified. Hence, we approach the fairness of ASR by training the ASR to minimize change in its outcome probabilities despite a counterfactual change in the individual's demographic attributes. Starting from the individualized counterfactual equal odds criterion, we provide relaxations to it and compare their performances for connectionist temporal classification (CTC) based end-to-end ASR systems. We perform our experiments on the Corpus of Regional African American Language (CORAAL) and the LibriSpeech dataset to accommodate for differences due to gender, age, education, and race. We show that with counterfactual training, we can reduce average character error rates while achieving lower performance gap between demographic groups, and lower error standard deviation among individuals.
引用
收藏
页码:3515 / 3525
页数:11
相关论文
共 50 条
  • [41] Chhattisgarhi speech corpus for research and development in automatic speech recognition
    Londhe, Narendra D.
    Kshirsagar, Ghanahshyam B.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (02) : 193 - 210
  • [42] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [43] Real and synthetic Punjabi speech datasets for automatic speech recognition
    Singh, Satwinder
    Hou, Feng
    Wang, Ruili
    DATA IN BRIEF, 2024, 52
  • [44] Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
    Moriya, Takafumi
    Sato, Hiroshi
    Ochiai, Tsubasa
    Delcroix, Marc
    Shinozaki, Takahiro
    IEEE ACCESS, 2023, 11 : 13906 - 13917
  • [45] KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition
    Bang, Jeong-Uk
    Yun, Seung
    Kim, Seung-Hi
    Choi, Mu-Yeol
    Lee, Min-Kyu
    Kim, Yeo-Jeong
    Kim, Dong-Hyun
    Park, Jun
    Lee, Young-Jik
    Kim, Sang-Hun
    APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 17
  • [46] Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition
    Chang, Xuankai
    Watanabe, Shinji
    Delcroix, Marc
    Ochiai, Tsubasa
    Zhang, Wangyou
    Qian, Yanmin
    IEEE SIGNAL PROCESSING MAGAZINE, 2024, 41 (06) : 39 - 50
  • [47] Gender Independent Bangla Automatic Speech Recognition
    Hassan, Foyzul
    Kotwal, Mohammed Rokibul Alam
    Khan, Mohammad Saiful Alam
    Huda, Mohammad Nurul
    2012 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2012, : 143 - 148
  • [48] Croatian Large Vocabulary Automatic Speech Recognition
    Martincic-Ipsic, Sanda
    Pobar, Miran
    Ipsic, Ivo
    AUTOMATIKA, 2011, 52 (02) : 147 - 157
  • [49] Transfer Learning for Automatic Speech Recognition Systems
    Asefisaray, Behnam
    Haznedaroglu, Ali
    Erden, Mustafa
    Arslan, Levent M.
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [50] Phoneme Confusions in Human and Automatic Speech Recognition
    Meyer, Bernd T.
    Waechter, Matthias
    Brand, Thomas
    Kollmeier, Birger
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2740 - 2743