Power Wavelet Cepstral Coefficients (PWCC): An Accurate Auditory Model-Based Feature Extraction Method for Robust Speaker Recognition

被引:0
作者
Zouhir, Youssef [1 ,2 ]
Zarka, Mohamed [3 ]
Ouni, Kais [1 ,2 ]
Amraoui, Lilia El [4 ]
机构
[1] Univ Carthage, Natl Engn Sch Carthage, Res Lab Smart Elect, Tunis 2035, Tunisia
[2] Univ Carthage, Natl Engn Sch Carthage, SE&ICT Lab, ICT,LR18ES44, Tunis 2035, Tunisia
[3] King Khalid Univ, Appl Coll Tanumah, Dept Comp Sci, Muhayil 61913, Saudi Arabia
[4] Princess Nourah Bint Abdulrahman Univ, Coll Engn, Dept Elect Engn, POB 84428, Riyadh 11671, Saudi Arabia
关键词
Feature extraction; Mel frequency cepstral coefficient; Accuracy; Wavelet transforms; Noise measurement; Filters; Computational modeling; Time-frequency analysis; Adaptation models; Speech recognition; Speaker recognition; machine learning; GMM-UBM; feature extraction; MFCC; PNCC; power wavelet cepstral coefficients (PWCC); noise robustness; wavelet transform; auditory models; cochlear filtering; biometric authentication; NOISE; FREQUENCY;
D O I
10.1109/ACCESS.2025.3576659
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human capability for Speaker Recognition (SR) exceeds recent machine learning approaches, even in noisy environments. To bridge this gap, researchers investigate the human auditory system to support machine learning algorithm performance. The paper introduces a novel feature extraction method, named "Power Wavelet Cepstral Coefficients" (PWCC), for enhancing SR accuracy. This method is derived from the "Normalized Wavelet FilterBank" (NWFB), which utilizes an "Equivalent Rectangular Bandwidth" rate (ERB-rate) scale and additionally integrates a "Noise Suppression Module" (NSM). The NWFB imitates the cochlea's frequency selectivity using "Morlet Wavelet filters" alongside an ERB-rate scale. The NSM applies a medium-duration power analysis, an asymmetrical noise-suppression module incorporating a temporal masking component, and a spectral smoothing module to reduce the impact of noisy signal. To assess the performance of the proposed PWCC method, experiments were conducted using clean speech signals from the TIMIT database, corrupted with various noises from the AURORA dataset. Using a "Gaussian Mixture Model-Universal Background Model" (GMM-UBM) classifier, the PWCC method demonstrated superior SR accuracy in noisy environments compared to traditional methods such as PNCC and MFCC. Furthermore, PWCC maintained higher precision, recall, and F1-scores than PNCC and MFCC under overall noise conditions. For instance, with babble noise at 15 dB SNR, PWCC achieved a recognition rate of 92.06%, compared to 75.24% for PNCC and 68.33% for MFCC.
引用
收藏
页码:102323 / 102338
页数:16
相关论文
共 68 条
[21]   A Review of Wavelet Analysis and Its Applications: Challenges and Opportunities [J].
Guo, Tiantian ;
Zhang, Tongpo ;
Lim, Enggee ;
Lopez-Benitez, Miguel ;
Ma, Fei ;
Yu, Limin .
IEEE ACCESS, 2022, 10 :58869-58903
[22]  
Gupta P., 2023, Comput. Speech Lang., V84
[23]   A review on speaker recognition: Technology and challenges [J].
Hanifa, Rafizah Mohd ;
Isa, Khalid ;
Mohamad, Shamsul .
COMPUTERS & ELECTRICAL ENGINEERING, 2021, 90
[24]   Feature extraction of acoustic signals based on complex Morlet wavelet [J].
He, Ping ;
Li, Pan ;
Sun, Huiqi .
CEIS 2011, 2011, 15
[25]   PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].
HERMANSKY, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752
[26]  
Hermansky H., 2005, Proci of Inter speech 2005, P361
[27]   RASTA Processing of Speech [J].
Hermansky, Hynek ;
Morgan, Nelson .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :578-589
[28]  
Hirsch H., 2002, P IEEE INT C AC SPEE, P153
[29]   A Hybrid Speech Enhancement Technique Based on Discrete Wavelet Transform and Spectral Subtraction [J].
Iqbal, Yasir ;
Zhang, Tao ;
Gunawan, Teddy Surya ;
Pratondo, Agus ;
Zhao, Xin ;
Geng, Yanzhang ;
Kartiwi, Mira ;
Saleem, Nasir ;
Bourouis, Sami .
IEEE ACCESS, 2025, 13 :39765-39781
[30]   Noise-robust text-dependent speaker identification using cochlear models [J].
Islam, Md. Atiqul ;
Xu, Ying ;
Monk, Travis ;
Afshar, Saeed ;
van Schaik, Andre .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2022, 151 (01) :500-516