Reverb and Noise as Real-World Effects in Speech Recognition Models: A Study and a Proposal of a Feature Set

被引：2

作者：

Cesarini, Valerio ^{[1
]}

Costantini, Giovanni ^{[1
]}

机构：

[1] Univ Roma Tor Vergata, Dept Elect Engn, I-00133 Rome, Italy

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期

关键词：

speaker recognition; data augmentation; noise; reverb; MFCC; RASTA; speaker verification; SVM; SPEAKER VERIFICATION;

D O I：

10.3390/app142311446

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Reverberation and background noise are common and unavoidable real-world phenomena that hinder automatic speaker recognition systems, particularly because these systems are typically trained on noise-free data. Most models rely on fixed audio feature sets. To evaluate the dependency of features on reverberation and noise, this study proposes augmenting the commonly used mel-frequency cepstral coefficients (MFCCs) with relative spectral (RASTA) features. The performance of these features was assessed using noisy data generated by applying reverberation and pink noise to the DEMoS dataset, which includes 56 speakers. Verification models were trained on clean data using MFCCs, RASTA features, or their combination as inputs. They validated on augmented data with progressively increasing noise and reverberation levels. The results indicate that MFCCs struggle to identify the main speaker, while the RASTA method has difficulty with the opposite class. The hybrid feature set, derived from their combination, demonstrates the best overall performance as a compromise between the two. Although the MFCC method is the standard and performs well on clean training data, it shows a significant tendency to misclassify the main speaker in real-world scenarios, which is a critical limitation for modern user-centric verification applications. The hybrid feature set, therefore, proves effective as a balanced solution, optimizing both sensitivity and specificity.

引用

页数：23

共 51 条

[1] Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review [J].

Abayomi-Alli, Olusola O. ;

Damasevicius, Robertas ;

Qazi, Atika ;

Adedoyin-Olowe, Mariam ;

Misra, Sanjay .

ELECTRONICS, 2022, 11 (22)

[2] Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers [J].

Abdalmalak, Kerlos Atia ;

Gallardo-Antolin, Ascension .

NEURAL COMPUTING & APPLICATIONS, 2018, 29 (03) :637-651

[3] Mitigate the reverberation effect on the speaker verification performance using different methods [J].

Al-Karawif, Khamis A. .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (01) :143-153

[4] Machine learning- and statistical-based voice analysis of Parkinson?s disease patients: A survey [J].

Amato, Federica ;

Saggio, Giovanni ;

Cesarini, Valerio ;

Olmo, Gabriella ;

Costantini, Giovanni .

EXPERT SYSTEMS WITH APPLICATIONS, 2023, 219

[5]

[Anonymous], 1987, Speech Communications: Human and Machine

[6]

Aslan Z., 2018, Int. J. Energy Eng. Sci, V3, P16

[7] Short-Utterance-Based Children's Speaker Verification in Low-Resource Conditions [J].

Aziz, Shahid ;

Ankita ;

Shahnawazuddin, S. .

CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (03) :1715-1740

[8]

Bogert B.P., 1963, Proceedings of the Symposium on Time Series Analysis, V15, P209

[9] Comparison of Modern Deep Learning Models for Speaker Verification [J].

Brydinskyi, Vitalii ;

Khoma, Yuriy ;

Sabodashko, Dmytro ;

Podpora, Michal ;

Khoma, Volodymyr ;

Konovalov, Alexander ;

Kostiak, Maryna .

APPLIED SCIENCES-BASEL, 2024, 14 (04)

[10] Voice Disorder Multi-Class Classification for the Distinction of Parkinson's Disease and Adductor Spasmodic Dysphonia [J].

Cesarini, Valerio ;

Saggio, Giovanni ;

Suppa, Antonio ;

Asci, Francesco ;

Pisani, Antonio ;

Calculli, Alessandra ;

Fayad, Rayan ;

Hajj-Hassan, Mohamad ;

Costantini, Giovanni .

APPLIED SCIENCES-BASEL, 2023, 13 (15)

← 1 2 3 4 5 6 →