Gender Recognition Based on the Stacking of Different Acoustic Features

被引:2
作者
Yuecesoy, Erguen [1 ]
机构
[1] Ordu Univ, Vocat Sch Tech Sci, TR-52200 Ordu, Turkiye
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 15期
关键词
gender recognition; hybrid features; MFCC; KNN; LDA; CNN; MLP; machine learning; deep learning;
D O I
10.3390/app14156564
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A speech signal can provide various information about a speaker, such as their gender, age, accent, and emotional state. The gender of the speaker is the most salient piece of information contained in the speech signal and is directly or indirectly used in many applications. In this study, a new approach is proposed for recognizing the gender of the speaker based on the use of hybrid features created by stacking different types of features. For this purpose, four different features, namely Mel frequency cepstral coefficients (MFCC), Mel scaled power spectrogram (Mel Spectrogram), Chroma, Spectral contrast (Contrast), and Tonal Centroid (Tonnetz), and twelve hybrid features created by stacking these features were used. These features were applied to four different classifiers, two of which were based on traditional machine learning (KNN and LDA) while two were based on the deep learning approach (CNN and MLP), and the performance of each was evaluated separately. In the experiments conducted on the Turkish subset of the Common Voice dataset, it was observed that hybrid features, created by stacking different acoustic features, led to improvements in gender recognition accuracy ranging from 0.3 to 1.73%.
引用
收藏
页数:13
相关论文
共 34 条
  • [1] Al-Dujaili Mohammed Jawad, 2023, AIP Conference Proceedings, V2977, DOI 10.1063/5.0181969
  • [2] DGR: Gender Recognition of Human Speech Using One-Dimensional Conventional Neural Network
    Alkhawaldeh, Rami S.
    [J]. SCIENTIFIC PROGRAMMING, 2019, 2019
  • [3] Speaker Gender Recognition Based on Deep Neural Networks and ResNet50
    Alnuaim, Abeer Ali
    Zakariah, Mohammed
    Shashidhar, Chitra
    Hatamleh, Wesam Atef
    Tarazi, Hussam
    Shukla, Prashant Kumar
    Ratna, Rajnish
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [4] Boateng EY., 2020, JDAIP, V8, P341, DOI [10.4236/JDAIP.2020.84020, DOI 10.4236/JDAIP.2020.84020]
  • [5] Deep learning in computer vision: A critical review of emerging techniques and application scenarios
    Chai, Junyi
    Zeng, Hao
    Li, Anming
    Ngai, Eric W. T.
    [J]. MACHINE LEARNING WITH APPLICATIONS, 2021, 6
  • [6] Deep neural networks in the cloud: Review, applications, challenges and research directions
    Chan, Kit Yan
    Abu-Salih, Bilal
    Qaddoura, Raneem
    Al-Zoubi, Ala' M.
    Palade, Vasile
    Pham, Duc-Son
    Del Ser, Javier
    Muhammad, Khan
    [J]. NEUROCOMPUTING, 2023, 545
  • [7] An effective gender recognition approach using voice data via deeper LSTM networks
    Ertam, Fatih
    [J]. APPLIED ACOUSTICS, 2019, 156 : 351 - 358
  • [8] The use of multiple measurements in taxonomic problems
    Fisher, RA
    [J]. ANNALS OF EUGENICS, 1936, 7 : 179 - 188
  • [9] DISCRIMINATORY ANALYSIS - NONPARAMETRIC DISCRIMINATION - CONSISTENCY PROPERTIES
    FIX, E
    HODGES, JL
    [J]. INTERNATIONAL STATISTICAL REVIEW, 1989, 57 (03) : 238 - 247
  • [10] Gondohanindijo J, 2023, INT J ADV COMPUT SC, V14, P198