The effect of speech pathology on automatic speaker verification: a large-scale study

被引:2
作者
Tayebi Arasteh, Soroosh [1 ,2 ,3 ]
Weise, Tobias [1 ,2 ]
Schuster, Maria [4 ]
Noeth, Elmar [1 ]
Maier, Andreas [1 ]
Yang, Seung Hee [2 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, D-91058 Erlangen, Germany
[2] Friedrich Alexander Univ Erlangen Nurnberg, Speech & Language Proc Lab, D-91054 Erlangen, Germany
[3] Univ Hosp RWTH Aachen, Dept Diagnost & Intervent Radiol, D-52074 Aachen, Germany
[4] Ludwig Maximilians Univ Munchen, Dept Otorhinolaryngol Head & Neck Surg, D-80333 Munich, Germany
关键词
RECOGNITION; VOICE; ANONYMIZATION; ASR;
D O I
10.1038/s41598-023-47711-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n=3800 test subjects spanning various age groups and speech disorders, we employed a deep-learning-driven automatic speaker verification (ASV) approach. This resulted in a notable mean equal error rate (EER) of 0.89 +/- 0.06%, outstripping traditional benchmarks. Our comprehensive assessments demonstrate that pathological speech overall faces heightened privacy breach risks compared to healthy speech. Specifically, adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers. Crucially, speech intelligibility does not influence the ASV system's performance metrics. In pediatric cases, particularly those with cleft lip and palate, the recording environment plays a decisive role in re-identification. Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in ASV, accompanied by a logarithmic boost in ASV effectiveness. In essence, this research sheds light on the dynamics between pathological speech and speaker verification, emphasizing its crucial role in safeguarding patient confidentiality in our increasingly digitized healthcare era.
引用
收藏
页数:14
相关论文
共 37 条
  • [21] Large-Scale Language Modeling with Random Forests for Mandarin Chinese Speech-to-Text
    Oparin, Ilya
    Lamel, Lori
    Gauvain, Jean-Luc
    ADVANCES IN NATURAL LANGUAGE PROCESSING, 2010, 6233 : 269 - 280
  • [22] Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning
    Stowell, Dan
    Plumbley, Mark D.
    PEERJ, 2014, 2
  • [23] Scaling Up Class-Specific Kernel Discriminant Analysis for Large-Scale Face Verification
    Iosifidis, Alexandros
    Gabbouj, Moncef
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2016, 11 (11) : 2453 - 2465
  • [24] Chinese character handwriting: A large-scale behavioral study and a database
    Wang, Ruiming
    Huang, Shuting
    Zhou, Yacong
    Cai, Zhenguang G.
    BEHAVIOR RESEARCH METHODS, 2020, 52 (01) : 82 - 96
  • [25] Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises
    Chen, Jitong
    Wang, Yuxuan
    Yoho, Sarah E.
    Wang, DeLiang
    Healy, Eric W.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 139 (05) : 2604 - 2612
  • [26] Randomized approximate class-specific kernel spectral regression analysis for large-scale face verification
    Li, Ke
    Wu, Gang
    MACHINE LEARNING, 2022, 111 (06) : 2037 - 2091
  • [27] CSTD-Telugu Corpus: Crowd-Sourced Approach for Large-Scale Speech data collection
    Mirishkar, Ganesh S.
    Raju, Vishnu Vidyadhara V.
    Naroju, Meher Dinesh
    Maity, Sudhamay
    Yalla, Prakash
    Vuppala, Anil Kumar
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 511 - 517
  • [28] A Large-Scale Study of the Effects of Word Frequency and Predictability in Naturalistic Reading
    Shain, Cory
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4086 - 4094
  • [29] A body detection inversion effect revealed by a large-scale inattentional blindness experiment
    Gandolfo, Marco
    Peelen, Marius, V
    COGNITION, 2025, 259
  • [30] The benefits and costs of prior exposure: A large-scale study of interference effects in stimulus identification
    Pilotti, Maura
    Chodorow, Martin
    Shono, Yusuke
    AMERICAN JOURNAL OF PSYCHOLOGY, 2009, 122 (02) : 191 - 208