The effect of speech pathology on automatic speaker verification: a large-scale study

被引:4
作者
Tayebi Arasteh, Soroosh [1 ,2 ,3 ]
Weise, Tobias [1 ,2 ]
Schuster, Maria [4 ]
Noeth, Elmar [1 ]
Maier, Andreas [1 ]
Yang, Seung Hee [2 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, D-91058 Erlangen, Germany
[2] Friedrich Alexander Univ Erlangen Nurnberg, Speech & Language Proc Lab, D-91054 Erlangen, Germany
[3] Univ Hosp RWTH Aachen, Dept Diagnost & Intervent Radiol, D-52074 Aachen, Germany
[4] Ludwig Maximilians Univ Munchen, Dept Otorhinolaryngol Head & Neck Surg, D-80333 Munich, Germany
关键词
RECOGNITION; VOICE; ANONYMIZATION; ASR;
D O I
10.1038/s41598-023-47711-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n=3800 test subjects spanning various age groups and speech disorders, we employed a deep-learning-driven automatic speaker verification (ASV) approach. This resulted in a notable mean equal error rate (EER) of 0.89 +/- 0.06%, outstripping traditional benchmarks. Our comprehensive assessments demonstrate that pathological speech overall faces heightened privacy breach risks compared to healthy speech. Specifically, adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers. Crucially, speech intelligibility does not influence the ASV system's performance metrics. In pediatric cases, particularly those with cleft lip and palate, the recording environment plays a decisive role in re-identification. Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in ASV, accompanied by a logarithmic boost in ASV effectiveness. In essence, this research sheds light on the dynamics between pathological speech and speaker verification, emphasizing its crucial role in safeguarding patient confidentiality in our increasingly digitized healthcare era.
引用
收藏
页数:14
相关论文
共 38 条
[31]   The benefits and costs of prior exposure: A large-scale study of interference effects in stimulus identification [J].
Pilotti, Maura ;
Chodorow, Martin ;
Shono, Yusuke .
AMERICAN JOURNAL OF PSYCHOLOGY, 2009, 122 (02) :191-208
[32]   A Large-Scale Synthetic Gait Dataset Towards in-the-Wild Simulation and Comparison Study [J].
Zhang, Pengyi ;
Dou, Huanzhang ;
Zhang, Wenhu ;
Zhao, Yuhan ;
Qin, Zequn ;
Hu, Dongping ;
Fang, Yi ;
Li, Xi .
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
[33]   The Chinese Zodiac-in-Noise Test: An Internet-Based Speech-in-Noise Test for Large-Scale Hearing Screening [J].
Zhou, Huali ;
Meng, Qinglin ;
Liu, Xiaohong ;
Wu, Peina ;
Shang, Shidong ;
Xiao, Wei ;
Kang, Yuyong ;
Li, Jiawen ;
Wang, Yamin ;
Zheng, Nengheng .
EAR AND HEARING, 2024, 45 (02) :451-464
[34]   MIT Advanced Vehicle Technology Study: Large-Scale Naturalistic Driving Study of Driver Behavior and Interaction With Automation [J].
Fridman, Lex ;
Brown, Daniel E. ;
Glazer, Michael ;
Angell, William ;
Dodd, Spencer ;
Jenik, Benedikt ;
Terwilliger, Jack ;
Patsekin, Aleksandr ;
Kindelsberger, Julia ;
Ding, Li ;
Seaman, Sean ;
Mehler, Alea ;
Sipperley, Andrew ;
Pettinato, Anthony ;
Seppelt, Bobbie D. ;
Angell, Linda ;
Mehler, Bruce ;
Reimer, Bryan .
IEEE ACCESS, 2019, 7 :102021-102038
[35]   Sex-Dependent Dissociation between Emotional Appraisal and Memory: A Large-Scale Behavioral and fMRI Study [J].
Spalek, Klara ;
Fastenrath, Matthias ;
Ackermann, Sandra ;
Auschra, Bianca ;
Coynel, David ;
Frey, Julia ;
Gschwind, Leo ;
Hartmann, Francina ;
van der Maarel, Nadine ;
Papassotiropoulos, Andreas ;
de Quervain, Dominique ;
Milnik, Annette .
JOURNAL OF NEUROSCIENCE, 2015, 35 (03) :920-935
[36]   Which CNNs and Training Settings to Choose for Action Unit Detection? A Study Based on a Large-Scale Dataset [J].
Bishay, Mina ;
Ghoneim, Ahmed ;
Ashraf, Mohamed ;
Mavadati, Mohammad .
2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
[37]   Towards an automatic system for road lane marking extraction in large-scale aerial images acquired over rural areas by hierarchical image analysis and Gabor filter [J].
Jin, Hang ;
Feng, Yanming ;
Li, Maoxun .
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2012, 33 (09) :2747-2769
[38]   Automatic Extraction of Indoor Spatial Information from Floor Plan Image: A Patch-Based Deep Learning Methodology Application on Large-Scale Complex Buildings [J].
Kim, Hyunjung ;
Kim, Seongyong ;
Yu, Kiyun .
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (12)