Voice Gender Recognition Using Acoustic Features, MFCCs and SVM

被引:0
作者
Abakarim, Fadwa [1 ]
Abenaou, Abdenbi [1 ]
机构
[1] Ibn Zohr Univ, Natl Sch Appl Sci, Res Team Appl Math & Intelligent Syst Engn, Agadir 80000, Morocco
来源
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2022, PT I | 2022年 / 13375卷
关键词
Signal processing; Gender recognition; Acoustic features; Mel-Frequency Cepstral Coefficients; Zero-crossing rate; Support Vector Machine;
D O I
10.1007/978-3-031-10522-7_43
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a voice gender recognition system. Acoustic features and Mel-Frequency Cepstral Coefficients (MFCCs) are extracted to define the speaker's gender. The most used features in these kinds of studies are acoustic features, but in this work, we combined them with MFCCs to test if we will get more satisfactory results. To examine the performance of the proposed system we tried four different databases: the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Saarbruecken Voice Database (SVD), the CMU_ARCTIC database and the Amazigh speech database (Self-Created). At the pre-processing stage, we removed the silence from the signals by using Zero-Crossing Rate (ZCR), but we kept the noises. Support Vector Machine (SVM) is used as the classification model. The combination of acoustic features and MFCCs achieves an average accuracy of 90.61% with the RAVDESS database, 92.73% with the SVD database, 99.87% with the CMU_ARCTIC database and 99.95% with the Amazigh speech database.
引用
收藏
页码:634 / 648
页数:15
相关论文
共 31 条
[1]  
Abakarim F., 2020, P 2020 INT C INTELLI, P1, DOI [10.1109/ISCV49265.2020.9204291, DOI 10.1109/ISCV49265.2020.9204291]
[2]   Voice Pathology Detection Using the Adaptive Orthogonal Transform Method, SVM and MLP [J].
Abakarim, Fadwa ;
Abenaou, Abdenbi .
INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2021, 17 (14) :90-102
[3]   DGR: Gender Recognition of Human Speech Using One-Dimensional Conventional Neural Network [J].
Alkhawaldeh, Rami S. .
SCIENTIFIC PROGRAMMING, 2019, 2019
[4]  
[Anonymous], 2015, INT J COMPUTER SCI E, DOI DOI 10.48550/ARXIV.1601.01577
[5]  
[Anonymous], 2003, CMU ARCTIC databases for speech synthesis
[6]  
[Anonymous], 2010, J. Comput.
[7]  
Archana G. S., 2015, Proceedings of 2015 Global Conference on Communication Technologies (GCCT), P483, DOI 10.1109/GCCT.2015.7342709
[8]  
Barkana, 2008, AM SOC ENG ED ASEE Z, P1
[9]   Bagged support vector machines for emotion recognition from speech [J].
Bhavan, Anjali ;
Chauhan, Pankaj ;
Hitkul ;
Shah, Rajiv Ratn .
KNOWLEDGE-BASED SYSTEMS, 2019, 184
[10]  
Ng CB, 2012, Arxiv, DOI arXiv:1204.1611