Counterfactually Fair Automatic Speech Recognition

被引：10

作者：

Sari, Leda ^{[1
,2
]}

Hasegawa-Johnson, Mark ^{[1
]}

Yoo, Chang D. ^{[3
]}

机构：

[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

[2] Meta AI, Menlo Pk, CA 94025 USA

[3] Korea Adv Inst Sci & Technol, Daejeon 34141, South Korea

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2021年 / 29卷

关键词：

Training; Machine learning; Speech processing; Error analysis; Machine learning algorithms; Computational modeling; Transducers; Automatic speech recognition; speaker adaptation; fairness in machine learning; counterfactual fairness; BIAS;

D O I：

10.1109/TASLP.2021.3126949

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Widelyused automatic speech recognition (ASR) systems have been empirically demonstrated in various studies to be unfair, having higher error rates for some groups of users than others. One way to define fairness in ASR is to require that changing the demographic group affiliation of any individual (e.g., changing their gender, age, education or race) should not change the probability distribution across possible speech-to-text transcriptions. In the paradigm of counterfactual fairness, all variables independent of group affiliation (e.g., the text being read by the speaker) remain unchanged, while variables dependent on group affiliation (e.g., the speaker's voice) are counterfactually modified. Hence, we approach the fairness of ASR by training the ASR to minimize change in its outcome probabilities despite a counterfactual change in the individual's demographic attributes. Starting from the individualized counterfactual equal odds criterion, we provide relaxations to it and compare their performances for connectionist temporal classification (CTC) based end-to-end ASR systems. We perform our experiments on the Corpus of Regional African American Language (CORAAL) and the LibriSpeech dataset to accommodate for differences due to gender, age, education, and race. We show that with counterfactual training, we can reduce average character error rates while achieving lower performance gap between demographic groups, and lower error standard deviation among individuals.

引用

页码：3515 / 3525

页数：11

共 50 条

[1] A Survey of Automatic Speech Recognition for Dysarthric Speech
Qian, Zhaopeng
Xiao, Kejing
ELECTRONICS, 2023, 12 (20)
[2] Acoustic Analysis for Automatic Speech Recognition
O'Shaughnessy, Douglas
PROCEEDINGS OF THE IEEE, 2013, 101 (05) : 1038 - 1053
[3] Counterfactually Fair Prediction Using Multiple Causal Models
Zennaro, Fabio Massimo
Ivanovska, Magdalena
MULTI-AGENT SYSTEMS, EUMAS 2018, 2019, 11450 : 249 - 266
[4] Gender domain adaptation for automatic speech recognition
Sokolov, Artem
Savchenko, Anclrey V.
2021 IEEE 19TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2021), 2021, : 413 - 417
[5] Machine Learning in Automatic Speech Recognition: A Survey
Padmanabhan, Jayashree
Premkumar, Melvin Jose Johnson
IETE TECHNICAL REVIEW, 2015, 32 (04) : 240 - 251
[6] Automatic speech recognition: a survey
Mishaim Malik
Muhammad Kamran Malik
Khawar Mehmood
Imran Makhdoom
Multimedia Tools and Applications, 2021, 80 : 9411 - 9457
[7] Efficient automatic speech recognition
O'Shaughnessy, D
PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON INTERNET AND MULTIMEDIA SYSTEMS AND APPLICATIONS, 2004, : 323 - 327
[8] NETWORKS FOR SPEECH ENHANCEMENT AND AUTOMATIC SPEECH RECOGNITION
Vu, Thanh T.
Bigot, Benjamin
Chng, Eng Siong
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 499 - 503
[9] Automatic speech recognition: a survey
Malik, Mishaim
Malik, Muhammad Kamran
Mehmood, Khawar
Makhdoom, Imran
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 9411 - 9457
[10] Targeted Universal Adversarial Perturbations for Automatic Speech Recognition
Zong, Wei
Chow, Yang-Wai
Susilo, Willy
Rana, Santu
Venkatesh, Svetha
INFORMATION SECURITY (ISC 2021), 2021, 13118 : 358 - 373

← 1 2 3 4 5 →