Effectiveness of Speech Demodulation-Based Features for Replay Detection

被引:40
作者
Kamble, Madhu R. [1 ]
Tak, Hemlata [1 ]
Patil, Hemant A. [1 ]
机构
[1] DA IICT, Speech Res Lab, Gandhinagar, Gujarat, India
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
Spoofing; Hilbert transform; Teager energy operator; energy separation algorithm; AUTOMATIC SPEAKER VERIFICATION; ENERGY SEPARATION; COUNTERMEASURES; FREQUENCY;
D O I
10.21437/Interspeech.2018-1675
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Replay attack presents a great threat to Automatic Speaker Verification (ASV) system. The speech can be modeled as amplitude and frequency modulated (AM-FM) signals. In this paper, we explore speech demodulation-based features using Hilbert transform (HT) and Teager Energy Operator (TEO) for replay detection. In particular, we propose features, namely, FIT-based Instantaneous Amplitude (IA) and Instantaneous Frequency (IF) Cosine Coefficients (i.e., HT-IACC and HT-IFCC) and Energy Separation Algorithm (ESA)-based features (i.e., ESA-IACC and ESA-IFCC). For adapting instantaneous energy w.r.t given sampling frequency, ESA requires 3 samples whereas FIT requires relatively large number of samples and thus, ESA gives high time resolution.The experiments were performed on ASV spoof 2017 Challenge database for replay spoof speech detection (SSD).The experimental results shows that ESA-based features gave lower EER. In addition, linearly spaced Gabor filterbank gave lower EER than Butterworth filterbank. To explore possible complementary information using amplitude and frequency, we have used score-level fusion of IA and IF. With HT-based feature set, the score-level fusion gave EER of 5.24 % (dev) and 10.03 % (eval), whereas ESA-based feature set reduced the EER to 2.01 % (dev) and 9.64 % (eval).
引用
收藏
页码:641 / 645
页数:5
相关论文
共 37 条
[1]  
Alam MJ, 2011, LECT NOTES ARTIF INT, V7015, P246, DOI 10.1007/978-3-642-25020-0_32
[2]  
[Anonymous], 1999, WAVELET TOUR SIGNAL
[3]  
[Anonymous], 1995, TIME FREQUENCY ANAL
[4]  
[Anonymous], ACOUST SPEECH SIG PR
[5]   Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion [J].
Cai, Weicheng ;
Cai, Danwei ;
Liu, Wenbo ;
Li, Gang ;
Li, Ming .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :17-21
[6]   ResNet and Model Fusion for Automatic Spoofing Detection [J].
Chen, Zhuxin ;
Xie, Zhifeng ;
Zhang, Weibin ;
Xu, Xiangmin .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :102-106
[7]  
Deng L., 2003, Speech processing: a dynamic and optimizationoriented approach
[8]  
Evans N, 2013, INTERSPEECH, P925
[9]   Speaker Recognition by Machines and Humans [J].
Hansen, John H. L. ;
Hasan, Taufiq .
IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) :74-99
[10]  
KAISER JF, 1990, INT CONF ACOUST SPEE, P381, DOI 10.1109/ICASSP.1990.115702