Long Range Acoustic Features for Spoofed Speech Detection

被引:34
作者
Das, Rohan Kumar [1 ]
Yang, Jichen [1 ]
Li, Haizhou [1 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
来源
INTERSPEECH 2019 | 2019年
关键词
anti-spoofing; logical access; physical access; ASVspoof; 2019; challenge; SPEAKER VERIFICATION; INSTANTANEOUS FREQUENCY; COUNTERMEASURES;
D O I
10.21437/Interspeech.2019-1887
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speaker verification systems in practice are vulnerable to spoofing attacks. The high quality recording and playback devices make replay attack a real threat to speaker verification. Additionally, the furtherance in voice conversion and speech synthesis has produced perceptually natural sounding speech. The ASVspoof 2019 challenge is organized to study the robustness of countermeasures against such attacks, which cover two common modes of attacks, logical and physical access. The former deals with synthetic attacks arising from voice conversion and text-to-speech techniques, whereas the latter deals with replay attacks. In this work, we explore several novel countermeasures based on long range acoustic features that are found to be effective for spoofing attack detection. The long range features capture different aspects of long range information as they are computed from subbands and octave power spectrum in contrast to the conventional way from linear power spectrum. These novel features are combined with the other known features for improved detection of spoofing attacks. We obtain a tandem detection cost function of 0.1264 and 0.1381 (equal error rate 4.13% and 5.95%) for logical and physical access on the best combined system submitted to the challenge.
引用
收藏
页码:1058 / 1062
页数:5
相关论文
共 37 条
[1]  
[Anonymous], 2018, DIGITAL SIGNAL UNPUB
[2]  
[Anonymous], 2013, SLTC NEWSLETTER
[3]  
[Anonymous], 2019, ASVspoof 2019: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan
[4]  
[Anonymous], 2018, P ODYSSEY 2018, DOI DOI 10.1145/3159450.3159544
[5]  
[Anonymous], 2019, IEEE ACM T AUDIO SPE
[6]  
Brummer<spacing diaeresis> N., 2013, The bosaris toolkit: Theory, algorithms and code for surviving the new dcf
[7]  
Das RK, 2018, ASIAPAC SIGN INFO PR, P1030, DOI 10.23919/APSIPA.2018.8659789
[8]   Development of Multi-Level Speech based Person Authentication System [J].
Das, Rohan Kumar ;
Jelil, Sarfaraz ;
Prasanna, S. R. Mahadeva .
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2017, 88 (03) :259-271
[9]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[10]   Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech [J].
De Leon, Phillip L. ;
Pucher, Michael ;
Yamagishi, Junichi ;
Hernaez, Inma ;
Saratxaga, Ibon .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (08) :2280-2290