Deep Convolutional Neural Network for Voice Liveness Detection

被引:0
作者
Gupta, Siddhant [1 ]
Khoria, Kuldeep [1 ]
Patil, Ankur T. [1 ]
Patil, Hemant A. [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol, Gandhinagar, Gujarat, India
来源
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2021年
关键词
Voice liveness detection; Pop noise; CNN; POCO dataset;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we present the system to detect the liveness by identifying the pop noise in the voice signal in order to avoid the security breach of ASV systems. Pop noise is created due to spontaneous breathing while uttering a certain phonemes, and it has low-frequency characteristics. Given the low-frequency characteristics of the pop noise, we have used the short-time Fourier transform (STFT) with low-frequency contents (0-40 Hz) as a feature set along with a convolutional neural network as a classifier. The experiments are performed using the recently released POp noise COrpus (POCO) dataset. We have considered the approach given in the original POCO dataset paper as a baseline and compared the results with the proposed architecture. The performance of the proposed architecture is measured using 10-fold cross-validation and the customized disjoint partition of the dataset. It is observed that the proposed architecture shows an improvement in accuracy for voice liveness detection in both cases. In particular, the proposed architecture obtained 19.86% and 18.22% absolute improvement in accuracy for 10-fold cross-validation and customized data partition, respectively, as compared to the baseline.
引用
收藏
页码:775 / 779
页数:5
相关论文
共 19 条
[1]   Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders [J].
Ai, Yang ;
Ling, Zhen-Hua .
INTERSPEECH 2020, 2020, :190-194
[2]   POCO: a Voice Spoofing and Liveness Detection Corpus based on Pop Noise [J].
Akimoto, Kosuke ;
Liew, Seng Pei ;
Mishima, Sakiko ;
Mizushima, Ryo ;
Lee, Kong Aik .
INTERSPEECH 2020, 2020, :1081-1085
[3]  
[Anonymous], 2006, Discrete-time Speech Signal Processing: principles and practice
[4]  
Delgado H., 2018, ODYSSEY 2018 THE SPE
[5]   Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition [J].
Ding, Shaojin ;
Zhao, Guanlong ;
Gutierrez-Osuna, Ricardo .
INTERSPEECH 2020, 2020, :776-780
[6]   WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU [J].
Hsu, Po-chun ;
Lee, Hung-yi .
INTERSPEECH 2020, 2020, :210-214
[7]   50 years of biometric research: Accomplishments, challenges, and opportunities [J].
Jain, Anil K. ;
Nandakumar, Karthik ;
Ross, Arun .
PATTERN RECOGNITION LETTERS, 2016, 79 :80-105
[8]   The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection [J].
Kinnunen, Tomi ;
Sahidullah, Md ;
Delgado, Hector ;
Todisco, Massimiliano ;
Evans, Nicholas ;
Yamagishi, Junichi ;
Lee, Kong Aik .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2-6
[9]  
Kinnunen T, 2017, INT CONF ACOUST SPEE, P5395, DOI 10.1109/ICASSP.2017.7953187
[10]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90