Convolutional neural network-based detection of audio replay attacks in speaker verification systems

被引：0

作者：

Khamis A. Al-Karawi ^{[1
]}

机构：

[1] University of Diyala,

来源：

International Journal of Speech Technology | 2025年 / 28卷 / 1期

关键词：

Voice Spoofing; Anti-spoofing; Feature extraction; Speaker recognition; Replay attack detection; CNN; Deep learning;

D O I：

10.1007/s10772-025-10173-5

中图分类号：

学科分类号：

摘要：

Replay attacks, where recorded audio is used to spoof speaker verification systems, are a common and inexpensive threat that requires minimal technical expertise. This article presents a convolutional neural network and non-voiced audio segment-based method for detecting repeat attacks in speaker verification systems. Value on reverberation and channel noise found in non-voiced sections can help separate real audio from repeats. Emphasizing these segments' Fast Fourier Transform (FFT) spectrograms, the CNN model is trained to identify audio authenticity by minimizing the feature set by eliminating spoken components, producing faster training durations without affecting detection accuracy. Model performance is assessed using the ASVspoof 2019 dataset by the Equal Error Rate (EER) metric, obtaining an EER of 14.31% on the evaluation set and 6.75% on the development set, thereby suggesting a great capacity in separating real audio from replayed audio.

引用

页码：175 / 184

页数：9