Time Series-based Spoof Speech Detection Using Long Short-term Memory and Bidirectional Long Short-term Memory

被引：0

作者：

Mirza, Arsalan R. ^{[1
]}

Al-Talabani, Abdulbasit K. ^{[2
]}

机构：

[1] Soran Univ, Fac Sci, Dept Comp Sci, Soran, Kurdistan, Iraq

[2] Koya Univ, Dept Software Engn, Fac Engn, Koya KOY45, Kurdistan, Iraq

来源：

ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY | 2024年 / 12卷 / 02期

关键词：

Long short-term memory; Constant Q cepstral coefficients; Countermeasure spoofing; Mel-frequency cepstral coefficients; Open-source speech and music interpretation by large-space extraction; Bidirectional long short-term memory; SPEAKER; COUNTERMEASURES; ASVSPOOF;

D O I：

10.14500/aro.11636

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Detecting fake speech in voice-based authentication systems is crucial for reliability. Traditional methods often struggle because they cannot handle the complex patterns over time. study introduces an advanced approach using deep learning, specifically long short-term memory (LSTM) and bidirectional LSTM (BiLSTM) models, tailored for identifying fake speech based on its temporal characteristics. We use speech signals with cepstral features such as mel-frequency cepstral coefficients (MFCC), constant Q cepstral coefficients, and open-source and music interpretation by large-space extraction to directly learn these patterns. Testing on the ASVspoof 2019 Logical Access dataset, we focus on metrics such as min-tDCF, equal error recall, precision, and F1-score. Our results show that LSTM BiLSTM models significantly enhance the reliability of spoof detection systems.

引用

页码：119 / 129

页数：11

共 33 条

[1] Mel Frequency Cepstral Coefficient and its Applications: A Review [J].

Abdul, Zrar Kh. ;

Al-Talabani, Abdulbasit K. K. .

IEEE ACCESS, 2022, 10 :122136-122158

[2] Replay spoofing countermeasure using autoencoder and siamese networks on ASVspoof 2019 challenge [J].

Adiban, Mohammad ;

Sameti, Hossein ;

Shehnepoor, Saeedreza .

COMPUTER SPEECH AND LANGUAGE, 2020, 64

[3]

Ahmed Nadeen, 2022, 2022 IEEE 7th Forum on Research and Technologies for Society and Industry Innovation (RTSI), P50, DOI 10.1109/RTSI55261.2022.9905158

[4] Speaker recognition based on deep learning: An overview [J].

Bai, Zhongxin ;

Zhang, Xiao-Lei .

NEURAL NETWORKS, 2021, 140 :65-99

[5] Data augmentation and hybrid feature amalgamation to detect audio deep fake attacks [J].

Chakravarty, Nidhi ;

Dua, Mohit .

PHYSICA SCRIPTA, 2023, 98 (09)

[6]

Dave N, 2013, Int. J. Adv. Res. Eng. Technol., V1, P1

[7]

Devesh K., 2022, Fake Speech Detection Using OpenSMILE Features.

[8]

Eyben F., 2010, Proceedings of the 18th ACM International Conference on Multimedia, P1459

[9] The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing [J].

Eyben, Florian ;

Scherer, Klaus R. ;

Schuller, Bjoern W. ;

Sundberg, Johan ;

Andre, Elisabeth ;

Busso, Carlos ;

Devillers, Laurence Y. ;

Epps, Julien ;

Laukka, Petri ;

Narayanan, Shrikanth S. ;

Truong, Khiet P. .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2016, 7 (02) :190-202

[10]

Hassan Farman, 2021, Proceedings of 2021 International Conference on Artificial Intelligence (ICAI), P209, DOI 10.1109/ICAI52203.2021.9445238

← 1 2 3 4 →