Gender and Age Estimation Methods Based on Speech Using Deep Neural Networks

被引:28
作者
Kwasny, Damian [1 ]
Hemmerling, Daria [1 ]
机构
[1] AGH Univ Sci & Technol, Dept Measurement & Elect, PL-30059 Krakow, Poland
关键词
speech processing; neural networks; gender classification; age estimation; x-vector; RECOGNITION;
D O I
10.3390/s21144785
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The speech signal contains a vast spectrum of information about the speaker such as speakers' gender, age, accent, or health state. In this paper, we explored different approaches to automatic speaker's gender classification and age estimation system using speech signals. We applied various Deep Neural Network-based embedder architectures such as x-vector and d-vector to age estimation and gender classification tasks. Furthermore, we have applied a transfer learning-based training scheme with pre-training the embedder network for a speaker recognition task using the Vox-Celeb1 dataset and then fine-tuning it for the joint age estimation and gender classification task. The best performing system achieves new state-of-the-art results on the age estimation task using popular TIMIT dataset with a mean absolute error (MAE) of 5.12 years for male and 5.29 years for female speakers and a root-mean square error (RMSE) of 7.24 and 8.12 years for male and female speakers, respectively, and an overall gender recognition accuracy of 99.60%.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks
    Jiang, Yi
    Wang, DeLiang
    Liu, RunSheng
    Feng, ZhenMing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 2112 - 2121
  • [22] Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks
    Tkachenko, Maxim
    Yamshinin, Alexander
    Lyubimov, Nikolay
    Kotov, Mikhail
    Nastasenko, Marina
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 690 - 699
  • [23] Age estimation using deep learning
    Zaghbani, Soumaya
    Boujneh, Noureddine
    Bouhlel, Med Salim
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 68 : 337 - 347
  • [24] LRTI: landmark ratios with task importance toward accurate age estimation using deep neural networks
    Badr, Marwa M.
    Elbasiony, Reda M.
    Sarhan, Amany M.
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (12) : 9647 - 9659
  • [25] LRTI: landmark ratios with task importance toward accurate age estimation using deep neural networks
    Marwa M. Badr
    Reda M. Elbasiony
    Amany M. Sarhan
    Neural Computing and Applications, 2022, 34 : 9647 - 9659
  • [26] Tree structure convolutional neural networks for gait-based gender and age classification
    Lau, L. K.
    Chan, Kwok
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (02) : 2145 - 2164
  • [27] Tree structure convolutional neural networks for gait-based gender and age classification
    L. K. Lau
    Kwok Chan
    Multimedia Tools and Applications, 2023, 82 : 2145 - 2164
  • [28] Face-Based Age and Gender Estimation Using Improved Convolutional Neural Network Approach
    Neha Sharma
    Reecha Sharma
    Neeru Jindal
    Wireless Personal Communications, 2022, 124 : 3035 - 3054
  • [29] Utilizing the Neural Networks for Speech Quality Estimation Based on the Network Characteristics
    Rozhon, Jan
    Voznak, Miroslav
    Rezac, Filip
    Slachta, Jiri
    AETA 2015: RECENT ADVANCES IN ELECTRICAL ENGINEERING AND RELATED SCIENCES, 2016, 371 : 99 - 109
  • [30] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107