Gender Recognition Based on the Stacking of Different Acoustic Features

被引:3
作者
Yuecesoy, Erguen [1 ]
机构
[1] Ordu Univ, Vocat Sch Tech Sci, TR-52200 Ordu, Turkiye
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 15期
关键词
gender recognition; hybrid features; MFCC; KNN; LDA; CNN; MLP; machine learning; deep learning;
D O I
10.3390/app14156564
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A speech signal can provide various information about a speaker, such as their gender, age, accent, and emotional state. The gender of the speaker is the most salient piece of information contained in the speech signal and is directly or indirectly used in many applications. In this study, a new approach is proposed for recognizing the gender of the speaker based on the use of hybrid features created by stacking different types of features. For this purpose, four different features, namely Mel frequency cepstral coefficients (MFCC), Mel scaled power spectrogram (Mel Spectrogram), Chroma, Spectral contrast (Contrast), and Tonal Centroid (Tonnetz), and twelve hybrid features created by stacking these features were used. These features were applied to four different classifiers, two of which were based on traditional machine learning (KNN and LDA) while two were based on the deep learning approach (CNN and MLP), and the performance of each was evaluated separately. In the experiments conducted on the Turkish subset of the Common Voice dataset, it was observed that hybrid features, created by stacking different acoustic features, led to improvements in gender recognition accuracy ranging from 0.3 to 1.73%.
引用
收藏
页数:13
相关论文
共 34 条
[1]  
Al-Dujaili Mohammed Jawad, 2023, AIP Conference Proceedings, V2977, DOI 10.1063/5.0181969
[2]   DGR: Gender Recognition of Human Speech Using One-Dimensional Conventional Neural Network [J].
Alkhawaldeh, Rami S. .
SCIENTIFIC PROGRAMMING, 2019, 2019
[3]   Speaker Gender Recognition Based on Deep Neural Networks and ResNet50 [J].
Alnuaim, Abeer Ali ;
Zakariah, Mohammed ;
Shashidhar, Chitra ;
Hatamleh, Wesam Atef ;
Tarazi, Hussam ;
Shukla, Prashant Kumar ;
Ratna, Rajnish .
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
[4]  
Boateng J., 2020, J Data Anal Info Proces, V8, P341, DOI [10.4236/jdaip.2020.84020, DOI 10.4236/JDAIP.2020.84020]
[5]   Deep learning in computer vision: A critical review of emerging techniques and application scenarios [J].
Chai, Junyi ;
Zeng, Hao ;
Li, Anming ;
Ngai, Eric W. T. .
MACHINE LEARNING WITH APPLICATIONS, 2021, 6
[6]   Deep neural networks in the cloud: Review, applications, challenges and research directions [J].
Chan, Kit Yan ;
Abu-Salih, Bilal ;
Qaddoura, Raneem ;
Al-Zoubi, Ala' M. ;
Palade, Vasile ;
Pham, Duc-Son ;
Del Ser, Javier ;
Muhammad, Khan .
NEUROCOMPUTING, 2023, 545
[7]   An effective gender recognition approach using voice data via deeper LSTM networks [J].
Ertam, Fatih .
APPLIED ACOUSTICS, 2019, 156 :351-358
[8]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188
[9]   DISCRIMINATORY ANALYSIS - NONPARAMETRIC DISCRIMINATION - CONSISTENCY PROPERTIES [J].
FIX, E ;
HODGES, JL .
INTERNATIONAL STATISTICAL REVIEW, 1989, 57 (03) :238-247
[10]  
Gondohanindijo J, 2023, INT J ADV COMPUT SC, V14, P198