Automatic Recognition of Speaker Age and Gender Based on Deep Neural Networks

被引：7

作者：

Markitantov, Maxim ^{[1
]}

Verkholyak, Oxana ^{[1
]}

机构：

[1] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg, Russia

来源：

SPEECH AND COMPUTER, SPECOM 2019 | 2019年 / 11658卷

基金：

俄罗斯科学基金会;

关键词：

Age and gender recognition; Computational Paralinguistics; Deep neural networks; Convolutional neural networks; Machine learning; CLASSIFICATION;

D O I：

10.1007/978-3-030-26061-3_34

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In the given article, we present a novel approach in the paralinguistic field of age and gender recognition by speaker voice based on deep neural networks. The training and testing of proposed models were implemented on the German speech corpus aGender. We conducted experiments using different network topologies, including neural networks with fully-connected and convolutional layers. In a joint recognition of speaker age and gender, our system reached the recognition performance measured as unweighted accuracy of 48.41%. In a separate age and gender recognition setup, the obtained performance was 57.53% and 88.80%, respectively. Applied deep neural networks provide the best result of speaker age recognition in comparison to existing traditional classification methods.

引用

页码：327 / 336

页数：10

共 24 条

[1] New transformed features generated by deep bottleneck extractor and a GMM-UBM classifier for speaker age and gender classification
Abu Mallouh, Arafat
Qawaqneh, Zakariya
Barkana, Buket D.
[J]. NEURAL COMPUTING & APPLICATIONS, 2018, 30 (08) : 2581 - 2593
[2] Abumallouh Arafat, 2016, ANN CONN C IND EL TE, P1, DOI DOI 10.1109/CT-IETA.2016.7868251
[3] [Anonymous], 2010, P 11 ANN C INT SPEEC
[4] [Anonymous], 2010, P 11 ANN C INT SPEEC
[5] Bocklet T, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P2834
[6] Brian M., 2015, P 14 PYTHON SCI C, P18, DOI [DOI 10.25080/MAJORA-7B98E3ED-003, 10. 25080/Majora-7b98e3ed-003]
[7] Burkhardt F, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P1562
[8] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
Dahl, George E.
Yu, Dong
Deng, Li
Acero, Alex
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
[9] Deselaers Thomas, 2009, Proceedings of the 4th Workshop on Statistical Machine Translation, P233
[10] Eyben Florian, 2010, P 18 ACM INT C MULT, P1459

← 1 2 3 →