Gender and Age Estimation Methods Based on Speech Using Deep Neural Networks

被引:31
作者
Kwasny, Damian [1 ]
Hemmerling, Daria [1 ]
机构
[1] AGH Univ Sci & Technol, Dept Measurement & Elect, PL-30059 Krakow, Poland
关键词
speech processing; neural networks; gender classification; age estimation; x-vector; RECOGNITION;
D O I
10.3390/s21144785
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The speech signal contains a vast spectrum of information about the speaker such as speakers' gender, age, accent, or health state. In this paper, we explored different approaches to automatic speaker's gender classification and age estimation system using speech signals. We applied various Deep Neural Network-based embedder architectures such as x-vector and d-vector to age estimation and gender classification tasks. Furthermore, we have applied a transfer learning-based training scheme with pre-training the embedder network for a speaker recognition task using the Vox-Celeb1 dataset and then fine-tuning it for the joint age estimation and gender classification task. The best performing system achieves new state-of-the-art results on the age estimation task using popular TIMIT dataset with a mean absolute error (MAE) of 5.12 years for male and 5.29 years for female speakers and a root-mean square error (RMSE) of 7.24 and 8.12 years for male and female speakers, respectively, and an overall gender recognition accuracy of 99.60%.
引用
收藏
页数:18
相关论文
共 50 条
[41]   Deep Convolutional Neural Networks for Large-scale Speech Tasks [J].
Sainath, Tara N. ;
Kingsbury, Brian ;
Saon, George ;
Soltau, Hagen ;
Mohamed, Abdel-rahman ;
Dahl, George ;
Ramabhadran, Bhuvana .
NEURAL NETWORKS, 2015, 64 :39-48
[42]   PHYSIOLOGICALLY-BASED SPEECH SYNTHESIS USING NEURAL NETWORKS [J].
HIRAYAMA, M ;
VATIKIOTISBATESON, E ;
KAWATO, M .
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1993, E76A (11) :1898-1910
[43]   Research and Application of Power Dispatching Speech Recognition Based on Deep Neural Networks [J].
Yu, Haitao ;
Wang, Xuqiang ;
Zheng, Jian ;
Zhou, Xiaoxi .
IEEE ACCESS, 2025, 13 :114772-114784
[44]   Age Estimation of Faces in Videos Using Head Pose Estimation and Convolutional Neural Networks [J].
Zhang, Beichen ;
Bao, Yue .
SENSORS, 2022, 22 (11)
[45]   Traffic Jam Probability Estimation Based on Blockchain and Deep Neural Networks [J].
Hassija, Vikas ;
Gupta, Vatsal ;
Garg, Sahil ;
Chamola, Vinay .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (07) :3919-3928
[46]   SPEECH RECOGNITION USING NEURAL NETWORKS [J].
Kumar, T. Lalith ;
Kumar, T. Kishore ;
Rajan, K. Soundar .
PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2009, :248-+
[47]   Simultaneous Gender Classification and Voice Activity Detection Using Deep Neural Networks [J].
Fujimura, Hiroshi .
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, :1139-1143
[48]   Survey on Deep Neural Networks in Speech and Vision Systems [J].
Alam, M. ;
Samad, M. D. ;
Vidyaratne, L. ;
Glandon, A. ;
Iftekharuddin, K. M. .
NEUROCOMPUTING, 2020, 417 :302-321
[49]   Speech De-identification with Deep Neural Networks [J].
Fodor, Adam ;
Kopacsi, Laszlo ;
Milacski, Zoltan A. ;
Lorincz, Andras .
ACTA CYBERNETICA, 2021, 25 (02) :257-269
[50]   Perception Science in the Age of Deep Neural Networks [J].
VanRullen, Rufin .
FRONTIERS IN PSYCHOLOGY, 2017, 8