Counterfactually Fair Automatic Speech Recognition

被引：10

作者：

Sari, Leda ^{[1
,2
]}

Hasegawa-Johnson, Mark ^{[1
]}

Yoo, Chang D. ^{[3
]}

机构：

[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

[2] Meta AI, Menlo Pk, CA 94025 USA

[3] Korea Adv Inst Sci & Technol, Daejeon 34141, South Korea

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2021年 / 29卷

关键词：

Training; Machine learning; Speech processing; Error analysis; Machine learning algorithms; Computational modeling; Transducers; Automatic speech recognition; speaker adaptation; fairness in machine learning; counterfactual fairness; BIAS;

D O I：

10.1109/TASLP.2021.3126949

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Widelyused automatic speech recognition (ASR) systems have been empirically demonstrated in various studies to be unfair, having higher error rates for some groups of users than others. One way to define fairness in ASR is to require that changing the demographic group affiliation of any individual (e.g., changing their gender, age, education or race) should not change the probability distribution across possible speech-to-text transcriptions. In the paradigm of counterfactual fairness, all variables independent of group affiliation (e.g., the text being read by the speaker) remain unchanged, while variables dependent on group affiliation (e.g., the speaker's voice) are counterfactually modified. Hence, we approach the fairness of ASR by training the ASR to minimize change in its outcome probabilities despite a counterfactual change in the individual's demographic attributes. Starting from the individualized counterfactual equal odds criterion, we provide relaxations to it and compare their performances for connectionist temporal classification (CTC) based end-to-end ASR systems. We perform our experiments on the Corpus of Regional African American Language (CORAAL) and the LibriSpeech dataset to accommodate for differences due to gender, age, education, and race. We show that with counterfactual training, we can reduce average character error rates while achieving lower performance gap between demographic groups, and lower error standard deviation among individuals.

引用

页码：3515 / 3525

页数：11

共 50 条

[31] Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding
Lecouteux, Benjamin
Linares, Georges
Esteve, Yannick
Gravier, Guillaume
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1251 - 1260
[32] The efficient incorporation of MLP features into automatic speech recognition systems
Park, J.
Diehl, F.
Gales, M. J. F.
Tomalin, M.
Woodland, P. C.
COMPUTER SPEECH AND LANGUAGE, 2011, 25 (03) : 519 - 534
[33] AUTOMATIC EVALUATION OF ENGLISH PRONUNCIATION BASED ON SPEECH RECOGNITION TECHNIQUES
HAMADA, H
MIKI, S
NAKATSU, R
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (03) : 352 - 359
[34] Study of Deep Learning and CMU Sphinx in Automatic Speech Recognition
Dhankar, Abhishek
2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 2296 - 2301
[35] Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
Moriya, Takafumi
Sato, Hiroshi
Ochiai, Tsubasa
Delcroix, Marc
Shinozaki, Takahiro
IEEE ACCESS, 2023, 11 : 13906 - 13917
[36] Automatic Speech Correction: A step to Speech Recognition for People with Disabilities
Terbeh, Naim
Labidi, Mohamed
Zrigui, Mounir
2013 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY AND ACCESSIBILITY (ICTA), 2013,
[37] Real and synthetic Punjabi speech datasets for automatic speech recognition
Singh, Satwinder
Hou, Feng
Wang, Ruili
DATA IN BRIEF, 2024, 52
[38] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
Espana-Bonet, Cristina
Fonollosa, Jose A. R.
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
[39] RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition
Georgescu, Alexandru-Lucian
Cucu, Horia
Buzo, Andi
Burileanu, Corneliu
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6606 - 6612
[40] Bangladeshi Bangla speech corpus for automatic speech recognition research
Kibria, Shafkat
Samin, Ahnaf Mozib
Kobir, M. Humayon
Rahman, M. Shahidur
Selim, M. Reza
Iqbal, M. Zafar
SPEECH COMMUNICATION, 2022, 136 : 84 - 97

← 1 2 3 4 5 →