Counterfactually Fair Automatic Speech Recognition

被引:10
|
作者
Sari, Leda [1 ,2 ]
Hasegawa-Johnson, Mark [1 ]
Yoo, Chang D. [3 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
[2] Meta AI, Menlo Pk, CA 94025 USA
[3] Korea Adv Inst Sci & Technol, Daejeon 34141, South Korea
关键词
Training; Machine learning; Speech processing; Error analysis; Machine learning algorithms; Computational modeling; Transducers; Automatic speech recognition; speaker adaptation; fairness in machine learning; counterfactual fairness; BIAS;
D O I
10.1109/TASLP.2021.3126949
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Widelyused automatic speech recognition (ASR) systems have been empirically demonstrated in various studies to be unfair, having higher error rates for some groups of users than others. One way to define fairness in ASR is to require that changing the demographic group affiliation of any individual (e.g., changing their gender, age, education or race) should not change the probability distribution across possible speech-to-text transcriptions. In the paradigm of counterfactual fairness, all variables independent of group affiliation (e.g., the text being read by the speaker) remain unchanged, while variables dependent on group affiliation (e.g., the speaker's voice) are counterfactually modified. Hence, we approach the fairness of ASR by training the ASR to minimize change in its outcome probabilities despite a counterfactual change in the individual's demographic attributes. Starting from the individualized counterfactual equal odds criterion, we provide relaxations to it and compare their performances for connectionist temporal classification (CTC) based end-to-end ASR systems. We perform our experiments on the Corpus of Regional African American Language (CORAAL) and the LibriSpeech dataset to accommodate for differences due to gender, age, education, and race. We show that with counterfactual training, we can reduce average character error rates while achieving lower performance gap between demographic groups, and lower error standard deviation among individuals.
引用
收藏
页码:3515 / 3525
页数:11
相关论文
共 50 条
  • [21] The WaveSurfer Automatic Speech Recognition Plugin
    Salvi, Giampiero
    Vanhainen, Niklas
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3067 - 3071
  • [22] Automatic speech recognition in neurodegenerative disease
    Benjamin G. Schultz
    Venkata S. Aditya Tarigoppula
    Gustavo Noffs
    Sandra Rojas
    Anneke van der Walt
    David B. Grayden
    Adam P. Vogel
    International Journal of Speech Technology, 2021, 24 : 771 - 779
  • [23] Graphical models and automatic speech recognition
    Bilmes, JA
    MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 191 - 245
  • [24] Automatic speech recognition in neurodegenerative disease
    Schultz, Benjamin G.
    Tarigoppula, Venkata S. Aditya
    Noffs, Gustavo
    Rojas, Sandra
    van der Walt, Anneke
    Grayden, David B.
    Vogel, Adam P.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 771 - 779
  • [25] Arabic Automatic Speech Recognition Enhancement
    Ahmed, Basem H. A.
    Ghabayen, Ayman S.
    2017 PALESTINIAN INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (PICICT), 2017, : 98 - 102
  • [26] Allophones in Automatic Whispery Speech Recognition
    Kozierski, Piotr
    Sadalla, Talar
    Drgas, Szymon
    Dabrowski, Adam
    2016 21ST INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS (MMAR), 2016, : 811 - 815
  • [27] Automatic emotion recognition by the speech signal
    Schuller, B
    Lang, M
    Rigoll, G
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IX, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING II, 2002, : 367 - 372
  • [28] Automatic Speech Recognition: An Improved Paradigm
    Topoleanu, Tudor-Sabin
    Mogan, Gheorghe Leonte
    TECHNOLOGICAL INNOVATION FOR SUSTAINABILITY, 2011, 349 : 269 - +
  • [29] Towards automatic recognition of emotion in speech
    Razak, AA
    Yusof, MHM
    Komiya, R
    PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2003, : 548 - 551
  • [30] AUTOMATIC EVALUATION OF ENGLISH PRONUNCIATION BASED ON SPEECH RECOGNITION TECHNIQUES
    HAMADA, H
    MIKI, S
    NAKATSU, R
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (03) : 352 - 359