Automatic Recognition of Speaker Age and Gender Based on Deep Neural Networks

被引:7
作者
Markitantov, Maxim [1 ]
Verkholyak, Oxana [1 ]
机构
[1] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg, Russia
来源
SPEECH AND COMPUTER, SPECOM 2019 | 2019年 / 11658卷
基金
俄罗斯科学基金会;
关键词
Age and gender recognition; Computational Paralinguistics; Deep neural networks; Convolutional neural networks; Machine learning; CLASSIFICATION;
D O I
10.1007/978-3-030-26061-3_34
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the given article, we present a novel approach in the paralinguistic field of age and gender recognition by speaker voice based on deep neural networks. The training and testing of proposed models were implemented on the German speech corpus aGender. We conducted experiments using different network topologies, including neural networks with fully-connected and convolutional layers. In a joint recognition of speaker age and gender, our system reached the recognition performance measured as unweighted accuracy of 48.41%. In a separate age and gender recognition setup, the obtained performance was 57.53% and 88.80%, respectively. Applied deep neural networks provide the best result of speaker age recognition in comparison to existing traditional classification methods.
引用
收藏
页码:327 / 336
页数:10
相关论文
共 24 条
  • [1] New transformed features generated by deep bottleneck extractor and a GMM-UBM classifier for speaker age and gender classification
    Abu Mallouh, Arafat
    Qawaqneh, Zakariya
    Barkana, Buket D.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2018, 30 (08) : 2581 - 2593
  • [2] Abumallouh Arafat, 2016, ANN CONN C IND EL TE, P1, DOI DOI 10.1109/CT-IETA.2016.7868251
  • [3] [Anonymous], 2010, P 11 ANN C INT SPEEC
  • [4] [Anonymous], 2010, P 11 ANN C INT SPEEC
  • [5] Bocklet T, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P2834
  • [6] Brian M., 2015, P 14 PYTHON SCI C, P18, DOI [DOI 10.25080/MAJORA-7B98E3ED-003, 10. 25080/Majora-7b98e3ed-003]
  • [7] Burkhardt F, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P1562
  • [8] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
    Dahl, George E.
    Yu, Dong
    Deng, Li
    Acero, Alex
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
  • [9] Deselaers Thomas, 2009, Proceedings of the 4th Workshop on Statistical Machine Translation, P233
  • [10] Eyben Florian, 2010, P 18 ACM INT C MULT, P1459