Voice Privacy Through x-vector and CycleGAN-based Anonymization

被引：7

作者：

Prajapati, Gauri P. ^{[1
]}

Singh, Dipesh K. ^{[1
]}

Amin, Preet P. ^{[1
]}

Patil, Hemant A. ^{[1
]}

机构：

[1] Dhirubhai Ambani Inst Informat & Commun Technol D, Gandhinagar, Gujarat, India

来源：

INTERSPEECH 2021 | 2021年

关键词：

Voice privacy; voice anonymization; CycleGAN; TRANSFORMATION; SPEAKER;

D O I：

10.21437/Interspeech.2021-1573

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

With the rise in usage of voice assistants and spoken language interfaces, important concerns regarding voice data privacy have been prompted. In an attempt to reduce the threat of attacks on voice data, in this paper, we propose a speaker anonymization system based on CycleGAN. This method modifies the speaker's gender and accent information from the original speech signal. The proposed method gives a more naturalsounding anonymized voice in addition to a de-identified speaker. We have chosen baseline-1 of The Voice Privacy Challenge-2020 as our baseline system. Training of CycleGAN, ASR, and ASV experiments are performed on the subset of Librispeech corpus. In this paper, the double anonymization technique is also explored in which the CycleGAN-based anonymization technique is adopted on top of the baseline system. Experimental results show that combining the proposed method with the x-vector and neural source-filter (NSF) model-based method (baseline system) gives up to 5:61% relative improvement in EER of original-anonymized, enroll-trial pairs. However, it gives up to 19:30% relative improvement in EER for anonymized-anonymized enroll-trial pairs. We observed that along with the good speaker de-identification, the anonymized utterances have adequate speech intelligibility and naturalness.

引用

页码：1684 / 1688

页数：5

共 30 条

[1]

Ahmed S, 2020, PROCEEDINGS OF THE 29TH USENIX SECURITY SYMPOSIUM, P2703

[2]

[Anonymous], 2017, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2017.244

[3]

Dauphin YN, 2017, PR MACH LEARN RES, V70

[4]

EUR-Lex, 2016, Regulation (EU) 2016/679 of the European Parliament and of the Council

[5]

Fang F., 2019, arXiv

[6]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

[7]

Gupta P, 2020, ASIAPAC SIGN INFO PR, P543

[8]

Hoy Matthew B., 2018, Medical Reference Services Quarterly, V37, P81, DOI 10.1080/02763869.2018.1404391

[9] Image-to-Image Translation with Conditional Adversarial Networks [J].

Isola, Phillip ;

Zhu, Jun-Yan ;

Zhou, Tinghui ;

Efros, Alexei A. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5967-5976

[10] Speaker De-identification via Voice Transformation [J].

Jin, Qin ;

Toth, Arthur R. ;

Schultz, Tanja ;

Black, Alan W. .

2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, :529-533

← 1 2 3 →