Voice gender recognition under unconstrained environments using self-attention

被引:17
作者
Nasef, Mohammed M. [1 ]
Sauber, Amr M. [1 ]
Nabil, Mohammed M. [1 ]
机构
[1] Menoufia Univ, Fac Sci, Math & Comp Sci Dept, Menoufia 32511, Egypt
关键词
Voice gender recognition; Self-attention; MFCC; Logistic regression; Inception;
D O I
10.1016/j.apacoust.2020.107823
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice Gender Recognition is a non-trivial task that is extensively studied in the literature, however, when the voice gets surrounded by noises and unconstrained environments, the task becomes more challenging. This paper presents two Self-Attention-based models to deliver an end-to-end voice gender recognition system under unconstrained environments. The first model consists of a stack of six self-attention layers and a dense layer. The second model adds a set of convolution layers and six inception-residual blocks to the first model before the self-attention layers. These models depend on Mel-frequency cepstral coefficients (MFCC) as a representation of the audio data, and Logistic Regression for classification. The experiments were done under unconstrained environments such as background noise and different languages, accents, ages and emotional states of the speakers. The results demonstrate that the proposed models were able to achieve an accuracy of 95.11%, 96.23%, respectively. These models achieved superior performance in all criteria and are believed to be state-of-the-art for Voice Gender Recognition under unconstrained environments. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 36 条
  • [1] [Anonymous], P 2008 SIAM INT C DA
  • [2] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [3] A new pitch-range based feature set for a speaker's age and gender classification
    Barkana, Buket D.
    Zhou, Jingcheng
    [J]. APPLIED ACOUSTICS, 2015, 98 : 52 - 61
  • [4] VISUAL-ATTENTION AND OBJECTS - EVIDENCE FOR HIERARCHICAL CODING OF LOCATION
    BAYLIS, GC
    DRIVER, J
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1993, 19 (03) : 451 - 470
  • [5] Buyukyilmaz M., 2016, 2016 INT C MOD SIM O
  • [6] Calders Toon., 2007, Efficient AUC Optimization for Classification
  • [7] Chung J. S., 2018, INTERSPEECH, DOI DOI 10.21437/INTERSPEECH.2018-1929
  • [8] Cramer J.S., 2002, The Origins of Logistic Regression, DOI 10.2139/ssrn.360300
  • [9] An effective gender recognition approach using voice data via deeper LSTM networks
    Ertam, Fatih
    [J]. APPLIED ACOUSTICS, 2019, 156 : 351 - 358
  • [10] Superconducting Detectors for Neutrino Mass Measurement
    Faverzani, M.
    Becker, D.
    Bennett, D.
    Day, P.
    Falferi, P.
    Ferri, E.
    Fowler, J.
    Gard, J.
    Giachero, A.
    Giordano, C.
    Hays-Wehle, J.
    Hilton, G.
    Maino, M.
    Margesin, B.
    Mates, J.
    Mezzena, R.
    Nizzolo, R.
    Nucciotti, A.
    Puiu, A.
    Reintsema, C.
    Schmidt, D.
    Swetz, D.
    Ullom, J.
    Vale, L.
    Zanetti, L.
    [J]. IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, 2016, 26 (03)