Effect of identical twins on deep speaker embeddings based forensic voice comparison

被引:0
作者
Abed M.H. [1 ]
Sztahó D. [1 ]
机构
[1] Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Magyar tudósok körútja, 2, Budapest
关键词
ECAPA-TDNN; Forensic voice comparison; Likelihood-ratio framework; Speaker verification; Twins; X-vectors;
D O I
10.1007/s10772-024-10108-6
中图分类号
学科分类号
摘要
Deep learning has gained widespread adoption in forensic voice comparison in recent years. It is mainly used to learn speaker representations, known as embedding features or vectors. In this work, the effect of identical twins on two state-of-the-art deep speaker embedding methods was investigated with special focus on metrics of forensic voice comparison. The speaker verification performance has been assessed using the likelihood-ratio framework by likelihood ratio cost and equal error rate. The AVTD twin speech dataset was applied. The results show a significant reduction in speaker verification performance when twin samples are present. Neither the adaptation of LR score calculation to twin samples, nor fine-tuning the pre-trained speaker embedding models seemed to be able to leverage this limitation. It was found that the recognition of same or different speakers was possible even in the case of identical twins but the performance dropped greatly. The lowest EER of the best performing model was 3.4% in the case of non-twin; at the same time, EER was 25.3% when twins were present. This doesn’t mean that the presented methods are useless in case of identical twins, but it must be taken into consideration that in case of a higher likelihood-ratio score (which indicates same speakers on the tested samples), the possibility of twins must also be considered in a real casework. © The Author(s) 2024.
引用
收藏
页码:341 / 351
页数:10
相关论文
共 28 条
[1]  
Abed M.H., Sztaho D., Effects of emotional speech on forensic voice comparison using deep speaker embeddings, In 19Th Hungarian Computational Linguistics Conference, pp. 159-170, (2023)
[2]  
Akin C., Kacar U., Kirci M., A Multi-Biometrics for Twins Identification Based Speech and Ear. Arxiv Preprint. Arxiv, 1801, (2018)
[3]  
Al-Ali A.K.H., Chandran V., Naik G.R., Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments, Evolutionary Intelligence, 14, pp. 1475-1494, (2021)
[4]  
Ariyaeeinia A., Morrison C., Malegaonkar A., Black S., Et al., A test of the effectiveness of speaker verification for differentiating between identical twins, Science & Justice, 48, pp. 182-186, (2008)
[5]  
Brummer N., Du Preez J., Application-independent evaluation of speaker detection, Computer Speech & Language, 20, pp. 230-275, (2006)
[6]  
Cihan A., Umit K., Murvet K., Twins Recognition Using Hierarchical Score Level Fusion. Arxiv Preprint. Arxiv, 1911, (2019)
[7]  
Desplanques B., Thienpondt J., Demuynck K., ECAPA-TDNN: Emphasized channel attention, propagation and aggregation in TDNN based speaker verification, In Proceedings of Interspeech, 2020, pp. 3830-3834, (2020)
[8]  
Ferragne E., Guyot Talbot A., Cecchini M., Beugnet M., Delanoe-Brun E., Georgeton L., Stecoli S., Bonastre J.-F., Fredouille C., Forensic audio and voice analysis: TV series reinforce false popular beliefs, Languages, 9, 2, (2024)
[9]  
Frost D., Ishihara S., Likelihood ratio-based forensic voice comparison on L2 speakers: A case of Hong Kong native male production of English vowels, In Proceedings of Australasian Language Technology Association Workshop, pp. 39-47, (2015)
[10]  
Geoffrey S.M., Measuring the validity and reliability of forensic likelihood-ratio systems, Science Justice, 51, pp. 91-98, (2011)