Exploring racial and gender disparities in voice biometrics

被引:0
作者
Xingyu Chen
Zhengxiong Li
Srirangaraj Setlur
Wenyao Xu
机构
[1] University of Colorado Denver,CSE
[2] University at Buffalo,CSE
[3] SUNY,undefined
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Systemic inequity in biometrics systems based on racial and gender disparities has received a lot of attention recently. These disparities have been explored in existing biometrics systems such as facial biometrics (identifying individuals based on facial attributes). However, such ethical issues remain largely unexplored in voice biometric systems that are very popular and extensively used globally. Using a corpus of non-speech voice records featuring a diverse group of 300 speakers by race (75 each from White, Black, Asian, and Latinx subgroups) and gender (150 each from female and male subgroups), we explore and reveal that racial subgroup has a similar voice characteristic and gender subgroup has a significant different voice characteristic. Moreover, non-negligible racial and gender disparities exist in speaker identification accuracy by analyzing the performance of one commercial product and five research products. The average accuracy for Latinxs can be 12% lower than Whites (p < 0.05, 95% CI 1.58%, 14.15%) and can be significantly higher for female speakers than males (3.67% higher, p < 0.05, 95% CI 1.23%, 11.57%). We further discover that racial disparities primarily result from the neural network-based feature extraction within the voice biometric product and gender disparities primarily due to both voice inherent characteristic difference and neural network-based feature extraction. Finally, we point out strategies (e.g., feature extraction optimization) to incorporate fairness and inclusive consideration in biometrics technology.
引用
收藏
相关论文
共 47 条
[1]  
Koenecke A(2020)Racial disparities in automated speech recognition Proc. Natl. Acad. Sci. 117 7684-7689
[2]  
Xue SA(2006)Normative standards for vocal tract dimensions by race as measured by acoustic pharyngometry J. Voice 20 391-400
[3]  
Hao JG(1980)Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Trans. Acoust. Speech Signal Process. 28 357-366
[4]  
Davis S(2005)A tutorial on onset detection in music signals IEEE Trans. Speech Audio Process. 13 1035-1047
[5]  
Mermelstein P(2007)Determining the effective or RMS voltage of various waveforms without calculus Technol. Interface 8 1-20
[6]  
Bello JP(1978)Perceptual effects of spectral modifications on musical timbres J. Acoust. Soc. Am. 63 1493-1500
[7]  
Cartwright KV(2014)A measure of information gained through biometric systems Image Vis. Comput. 32 1194-1203
[8]  
Grey JM(2002)Permutation entropy: A natural complexity measure for time series Phys. Rev. Lett. 88 174102-134
[9]  
Gordon JW(2014)Feature selection with SVD entropy: Some modification and extension Inf. Sci. 264 118-3399
[10]  
Takahashi K(2019)Acoustic and linguistic factors affecting perceptual dissimilarity judgments of voices J. Acoust. Soc. Am. 146 3384-9