Replay attack detection with complementary high-resolution information using end-to-end DNN for the ASVspoof 2019 Challenge

被引:16
作者
Jung, Jee-weon [1 ]
Shim, Hye-jin [1 ]
Heo, Hee-Soo [1 ]
Yu, Ha-Jin [1 ]
机构
[1] Univ Seoul, Sch Comp Sci, Seoul, South Korea
来源
INTERSPEECH 2019 | 2019年
基金
新加坡国家研究基金会;
关键词
replay detection; anti-spoofing; speaker recognition; representation learning; deep neural networks;
D O I
10.21437/Interspeech.2019-1991
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this study, we concentrate on replacing the process of extracting hand-crafted acoustic feature with end-to-end DNN using complementary high-resolution spectrograms. As a result of advance in audio devices, typical characteristics of a replayed speech based on conventional knowledge alter or diminish in unknown replay configurations. Thus, it has become increasingly difficult to detect spoofed speech with a conventional knowledge-based approach. To detect unrevealed characteristics that reside in a replayed speech, we directly input spectrograms into an end-to-end DNN without knowledge-based intervention. Explorations dealt in this study that differentiates from existing spectrogram-based systems are twofold: complementary information and high-resolution. Spectrograms with different information are explored, and it is shown that additional information such as the phase information can be complementary. High-resolution spectrograms are employed with the assumption that the difference between a bona-fide and a replayed speech exists in the details. Additionally, to verify whether other features are complementary to spectrograms, we also examine raw waveform and an i-vector based system. Experiments conducted on the ASVspoof 2019 physical access challenge show promising results, where t-DCF and equal error rates are 0.0570 and 2.45 % for the evaluation set, respectively.
引用
收藏
页码:1083 / 1087
页数:5
相关论文
共 29 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2017, ARXIV171010467
[3]  
[Anonymous], 2015, 16 ANN C INT SPEECH
[4]   On Phase-Magnitude Relationships in the Short-Time Fourier Transform [J].
Auger, Francois ;
Chassande-Mottin, Eric ;
Flandrin, Patrick .
IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (05) :267-270
[5]  
Chollet F., 2015, Keras
[6]  
Chorowski J. K, 2015, ADV NEURAL INFORM PR, V1, P577, DOI DOI 10.1016/0167-739X(94)90007-8
[7]   Experimental analysis of features for replay attack detection-Results on the ASVspoof 2017 Challenge [J].
Font, Roberto ;
Espin, Juan M. ;
Jose Cano, Maria .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :7-11
[8]  
Graves A, 2014, PR MACH LEARN RES, V32, P1764
[9]   Detection of Replay-Spoofing Attacks using Frequency Modulation Features [J].
Gunendradasan, Tharshini ;
Wickramasinghe, Buddhi ;
Phu Ngoc Le ;
Ambikairajah, Eliathamby ;
Epps, Julien .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :636-640
[10]  
He K., 2016, CVPR, DOI [10.1109/CVPR.2016.90, DOI 10.1109/CVPR.2016.90]