Morse wavelet transform-based features for voice liveness detection

被引:1
|
作者
Gupta, Priyanka [1 ]
Patil, Hemant A. [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol, Speech Res Lab, Gandhinagar 382007, India
来源
COMPUTER SPEECH AND LANGUAGE | 2024年 / 84卷
关键词
Automatic speaker verification; Voice liveness detection; Morse wavelet; Pop noise; Scalogram; CNN;
D O I
10.1016/j.csl.2023.101571
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The need for Voice Liveness Detection (VLD) has emerged particularly for the security of Automatic Speaker Verification (ASV) systems. Existing Spoofed Speech Detection (SSD) systems rely on attack-specific approaches to detect spoofed speech. However, to safeguard ASV systems against all the kinds of spoofing attacks (known as well as unknown attacks), determining whether a speech is uttered live (genuine) or not, is important. To that effect, in this work, we propose the detection of pop noise using Morse wavelet for VLD task. Pop noise is a discriminative acoustic cue that is present in live speech and is absent/diminished in spoofed speech. It is captured by the microphone in the form of sudden bursts of air from a live speaker's mouth due to the close proximity of the speaker with the microphone. To validate this hypothesis, we present an analysis of pop noise energy w.r.t. distance and found that it decreases exponentially with distance. Furthermore, pop noise is said to be present in very low frequency regions. To capture the pop noise effectively, we propose to exploit the excellent frequency resolution of Continuous Wavelet Transform (CWT) using Generalized Morse Wavelets (GMWs). GMWs are a superfamily of analytic wavelets. To that effect, in this work, we have analysed the suitability of GMWs for pop noise detection for VLD task using the POp noise COrpus (POCO). The wavelet parameters are fine-tuned according to the VLD task. Furthermore, the performance of VLD system is evaluated for various subband frequencies, and it is observed that the subband of 1 to 50Hz gives the best performance accuracy of 90.55% and 88.43% on the Dev and Eval sets, respectively. In addition, phoneme-based analysis shows the dependence of the performance of the VLD system on the type of phonemes in the utterances. It is shown that phonemes, such as plosives and fricatives show distinct pop noise as compared to other phonemes. Furthermore, the extension of the POCO dataset is used for experiments where simulated reverberation is added to spoofed signals, assuming the attacker (or the recording device) is positioned at various distances. This leads to the studying the effect of speaker-attacker distance. Similar to the previous results, it is observed that for the reverberated case too, the optimal frequency subband for VLD task is 1 to 50Hz, across all the distances. Furthermore, the proposed feature set is evaluated using three classifiers, namely, Convolutional Neural Network (CNN), Light CNN (LCNN), and Residual Neural Network (ResNet), for POCO dataset as well as reverberated POCO dataset. It is observed that CNN gives the highest accuracy of 88.43% on Eval set of the POCO dataset. Furthermore, the proposed features are also evaluated under the assumptions of two ideal scenarios - when the ASV system is strictly under attack, and when it is strictly not under attack. It is observed that the proposed Morse wavelet-based VLD system rejected 89% of the spoofed utterances, and accepted 88.30% of the genuine utterances.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Voice Liveness Detection using Constant-Q Transform-Based Features
    Patil, Ankur T.
    Khoria, Kuldeep
    Patil, Hemant A.
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 110 - 114
  • [2] Obstructive sleep apnea detection using discrete wavelet transform-based statistical features
    Rajesh, Kandala. N. V. P. S.
    Dhuli, Ravindra
    Kumar, T. Sunil
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 130
  • [3] A Discrete Wavelet Transform-Based Voice Activity Detection and Noise Classification With Sub-Band Selection
    Abdullah, Salinna
    Zamani, Majid
    Demosthenous, Andreas
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [4] An Continuous Wavelet Transform-Based Detection Approach to Traffic Anomalies
    Jiang, Dingde
    Yao, Cheng
    Xu, Zhengzheng
    Zhang, Peng
    Yuan, Zhen
    Qin, Wenda
    MECHANICAL AND ELECTRONICS ENGINEERING III, PTS 1-5, 2012, 130-134 : 2098 - 2102
  • [5] Wavelet transform-based ground fault detection for LVDC microgrid
    Lee K.-M.
    Park C.-W.
    Transactions of the Korean Institute of Electrical Engineers, 2021, 70 (09): : 1289 - 1294
  • [6] Scattering transform-based features for the automatic seizure detection
    Jiang, Yun
    Chen, Wanzhong
    You, Yang
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2020, 40 (01) : 77 - 89
  • [7] Voice Liveness Detection Using Bump Wavelet with CNN
    Gupta, Priyanka
    Gupta, Siddhant
    Patil, Hemant A.
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2021, 2024, 13102 : 91 - 98
  • [8] Detection of epilepsy using discrete cosine harmonic wavelet transform-based features and neural network classifier
    Kiranmayi, G. R.
    Udayashankara, V.
    INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2020, 32 (02) : 109 - 122
  • [9] A robust voice activity detection based on wavelet transform
    Aghajani, Kh.
    Manzuri, M. T.
    Karami, M.
    Tayebi, H.
    2008 SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, 2008, : 37 - +
  • [10] Discrete Wavelet Transform-Based Detection Transformer for Battery Weld Defect Detection
    Zhang, Kang
    Liao, Limin
    Wang, Yonghua
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74