Optimizing Tandem Speaker Verification and Anti-Spoofing Systems

被引:8
作者
Kanervisto, Anssi [1 ]
Hautamaki, Ville [1 ,2 ]
Kinnunen, Tomi [1 ]
Yamagishi, Junichi [3 ]
机构
[1] Univ Eastern Finland, Sch Comp, Joensuu 80101, Finland
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[3] Natl Inst Informat, Tokyo 1018430, Japan
基金
芬兰科学院;
关键词
Costs; Measurement; Training; Security; Error analysis; Task analysis; Reinforcement learning; security; speaker recognition; spoof countermeasures; IDENTIFICATION;
D O I
10.1109/TASLP.2021.3138681
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then the ASV can determine whether this speech matches the speaker's identity. The performance of such a tandem system can be measured with a tandem detection cost function (t-DCF). However, ASV and CM systems are usually trained separately, using different metrics and data, which does not optimize their combined performance. In this work, we propose to optimize the tandem system directly by creating a differentiable version of t-DCF and employing techniques from reinforcement learning. The results indicate that these approaches offer better outcomes than finetuning, with our method providing a 20% relative improvement in the t-DCF in the ASVSpoof19 dataset in a constrained setting.
引用
收藏
页码:477 / 488
页数:12
相关论文
共 41 条
[1]  
Bishop C., 2006, Pattern Recognition and Machine Learning
[2]   Application-independent evaluation of speaker detection [J].
Brümmer, N ;
du Preez, J .
COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) :230-275
[3]  
Brummer N, 2010, Measuring, refining and calibrating speaker and language information extracted from speech
[4]  
Brummer N., 2014, P OD, P14
[5]  
Chung JS, 2018, INTERSPEECH, P1086
[6]  
Ferrer L., 2021, ARXIV210201760, V71
[7]   A maximal figure-of-merit (MFoM)-learning approach to robust classifier design for text categorization [J].
Gao, Sheng ;
Wu, Wen ;
Lee, Chin-Hui ;
Chua, Tat-Seng .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2006, 24 (02) :190-218
[8]  
Garcia-Romero D., 2020, P OD SPEAK LANG REC, P1
[9]   Two decades of speaker recognition evaluation at the national institute of standards and technology [J].
Greenberg, Craig S. ;
Mason, Lisa P. ;
Sadjadi, Seyed Omid ;
Reynolds, Douglas A. .
COMPUTER SPEECH AND LANGUAGE, 2020, 60
[10]  
Gu S., 2017, INT C LEARN REPR