Voice Privacy Through x-vector and CycleGAN-based Anonymization

被引:7
作者
Prajapati, Gauri P. [1 ]
Singh, Dipesh K. [1 ]
Amin, Preet P. [1 ]
Patil, Hemant A. [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol D, Gandhinagar, Gujarat, India
来源
INTERSPEECH 2021 | 2021年
关键词
Voice privacy; voice anonymization; CycleGAN; TRANSFORMATION; SPEAKER;
D O I
10.21437/Interspeech.2021-1573
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
With the rise in usage of voice assistants and spoken language interfaces, important concerns regarding voice data privacy have been prompted. In an attempt to reduce the threat of attacks on voice data, in this paper, we propose a speaker anonymization system based on CycleGAN. This method modifies the speaker's gender and accent information from the original speech signal. The proposed method gives a more naturalsounding anonymized voice in addition to a de-identified speaker. We have chosen baseline-1 of The Voice Privacy Challenge-2020 as our baseline system. Training of CycleGAN, ASR, and ASV experiments are performed on the subset of Librispeech corpus. In this paper, the double anonymization technique is also explored in which the CycleGAN-based anonymization technique is adopted on top of the baseline system. Experimental results show that combining the proposed method with the x-vector and neural source-filter (NSF) model-based method (baseline system) gives up to 5:61% relative improvement in EER of original-anonymized, enroll-trial pairs. However, it gives up to 19:30% relative improvement in EER for anonymized-anonymized enroll-trial pairs. We observed that along with the good speaker de-identification, the anonymized utterances have adequate speech intelligibility and naturalness.
引用
收藏
页码:1684 / 1688
页数:5
相关论文
共 30 条
[1]  
Ahmed S, 2020, PROCEEDINGS OF THE 29TH USENIX SECURITY SYMPOSIUM, P2703
[2]  
[Anonymous], 2017, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2017.244
[3]  
Dauphin YN, 2017, PR MACH LEARN RES, V70
[4]  
EUR-Lex, 2016, Regulation (EU) 2016/679 of the European Parliament and of the Council
[5]  
Fang F., 2019, arXiv
[6]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[7]  
Gupta P, 2020, ASIAPAC SIGN INFO PR, P543
[8]  
Hoy Matthew B., 2018, Medical Reference Services Quarterly, V37, P81, DOI 10.1080/02763869.2018.1404391
[9]   Image-to-Image Translation with Conditional Adversarial Networks [J].
Isola, Phillip ;
Zhu, Jun-Yan ;
Zhou, Tinghui ;
Efros, Alexei A. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5967-5976
[10]   Speaker De-identification via Voice Transformation [J].
Jin, Qin ;
Toth, Arthur R. ;
Schultz, Tanja ;
Black, Alan W. .
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, :529-533