Counterfactually Fair Automatic Speech Recognition

被引：10

作者：

Sari, Leda ^{[1
,2
]}

Hasegawa-Johnson, Mark ^{[1
]}

Yoo, Chang D. ^{[3
]}

机构：

[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

[2] Meta AI, Menlo Pk, CA 94025 USA

[3] Korea Adv Inst Sci & Technol, Daejeon 34141, South Korea

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2021年 / 29卷

关键词：

Training; Machine learning; Speech processing; Error analysis; Machine learning algorithms; Computational modeling; Transducers; Automatic speech recognition; speaker adaptation; fairness in machine learning; counterfactual fairness; BIAS;

D O I：

10.1109/TASLP.2021.3126949

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Widelyused automatic speech recognition (ASR) systems have been empirically demonstrated in various studies to be unfair, having higher error rates for some groups of users than others. One way to define fairness in ASR is to require that changing the demographic group affiliation of any individual (e.g., changing their gender, age, education or race) should not change the probability distribution across possible speech-to-text transcriptions. In the paradigm of counterfactual fairness, all variables independent of group affiliation (e.g., the text being read by the speaker) remain unchanged, while variables dependent on group affiliation (e.g., the speaker's voice) are counterfactually modified. Hence, we approach the fairness of ASR by training the ASR to minimize change in its outcome probabilities despite a counterfactual change in the individual's demographic attributes. Starting from the individualized counterfactual equal odds criterion, we provide relaxations to it and compare their performances for connectionist temporal classification (CTC) based end-to-end ASR systems. We perform our experiments on the Corpus of Regional African American Language (CORAAL) and the LibriSpeech dataset to accommodate for differences due to gender, age, education, and race. We show that with counterfactual training, we can reduce average character error rates while achieving lower performance gap between demographic groups, and lower error standard deviation among individuals.

引用

页码：3515 / 3525

页数：11

共 50 条

[41] Chhattisgarhi speech corpus for research and development in automatic speech recognition
Londhe, Narendra D.
Kshirsagar, Ghanahshyam B.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (02) : 193 - 210
[42] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
Espana-Bonet, Cristina
Fonollosa, Jose A. R.
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
[43] Real and synthetic Punjabi speech datasets for automatic speech recognition
Singh, Satwinder
Hou, Feng
Wang, Ruili
DATA IN BRIEF, 2024, 52
[44] Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
Moriya, Takafumi
Sato, Hiroshi
Ochiai, Tsubasa
Delcroix, Marc
Shinozaki, Takahiro
IEEE ACCESS, 2023, 11 : 13906 - 13917
[45] KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition
Bang, Jeong-Uk
Yun, Seung
Kim, Seung-Hi
Choi, Mu-Yeol
Lee, Min-Kyu
Kim, Yeo-Jeong
Kim, Dong-Hyun
Park, Jun
Lee, Young-Jik
Kim, Sang-Hun
APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 17
[46] Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition
Chang, Xuankai
Watanabe, Shinji
Delcroix, Marc
Ochiai, Tsubasa
Zhang, Wangyou
Qian, Yanmin
IEEE SIGNAL PROCESSING MAGAZINE, 2024, 41 (06) : 39 - 50
[47] Gender Independent Bangla Automatic Speech Recognition
Hassan, Foyzul
Kotwal, Mohammed Rokibul Alam
Khan, Mohammad Saiful Alam
Huda, Mohammad Nurul
2012 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2012, : 143 - 148
[48] Croatian Large Vocabulary Automatic Speech Recognition
Martincic-Ipsic, Sanda
Pobar, Miran
Ipsic, Ivo
AUTOMATIKA, 2011, 52 (02) : 147 - 157
[49] Transfer Learning for Automatic Speech Recognition Systems
Asefisaray, Behnam
Haznedaroglu, Ali
Erden, Mustafa
Arslan, Levent M.
2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
[50] Phoneme Confusions in Human and Automatic Speech Recognition
Meyer, Bernd T.
Waechter, Matthias
Brand, Thomas
Kollmeier, Birger
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2740 - 2743

← 1 2 3 4 5 →