An effective gender recognition approach using voice data via deeper LSTM networks

被引:46
作者
Ertam, Fatih [1 ]
机构
[1] Firat Univ, Technol Fac, Dept Digital Forens Engn, Elazig, Turkey
关键词
Gender recognition; Gender classification; Deep learning; Deeper LSTM; Machine learning; SPEAKERS AGE; CLASSIFICATION; FRAMEWORK;
D O I
10.1016/j.apacoust.2019.07.033
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
It is not difficult to estimate the gender of the human from other people's audio files. In general, people can easily identify the gender of the owner of a conversation with the experience they have acquired. However, it is not easy to predict whether a person is a man or a woman by computer systems. Hence, many papers and proposals have been presented to solve this problem using computer systems. In this study, Deeper Long Short Term Memory (LSTM) Networks structure was used for the prediction of gender from an audio data set. The study was successful at predicting gender with an accuracy of 98.4%. The proposed approach consists of 3 main steps. Firstly, 10 most effective data attributes were selected (i). Then, a deep learning-based network was created with the double-layer LSTM structure (ii). In addition to the performance comparison of the classification, accuracy values, sensitivity, and specificity performance metrics were also calculated (iii). At the same time, the accuracy of the proposed method was compared with the accuracy values obtained from the classifiers generated by conventional machine learning approaches. The study was successful at predicting gender with 98.4% success rate. It is thought that the study will be a pioneer in this field as an effective and fast approach for gender recognition. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:351 / 358
页数:8
相关论文
共 33 条
[1]  
Acero A, 1996, INT CONF ACOUST SPEE, P342, DOI 10.1109/ICASSP.1996.541102
[2]  
[Anonymous], 2007, 2007 IEEE INT C AC S
[3]   Classification by clustering decision tree-like classifier based on adjusted clusters [J].
Aviad, Barak ;
Roy, Gelbard .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) :8220-8228
[4]   A deep learning framework for financial time series using stacked autoencoders and long-short term memory [J].
Bao, Wei ;
Yue, Jun ;
Rao, Yulei .
PLOS ONE, 2017, 12 (07)
[5]   A new pitch-range based feature set for a speaker's age and gender classification [J].
Barkana, Buket D. ;
Zhou, Jingcheng .
APPLIED ACOUSTICS, 2015, 98 :52-61
[6]  
Black M, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P2030
[7]  
Buyukyilmaz M, 2017, VOICE GENDER RECOGNI, DOI [10.2991/msota-16.2016.90, DOI 10.2991/MSOTA-16.2016.90]
[8]   A bilevel framework for joint optimization of session compensation and classification for speaker identification [J].
Chen, Chen ;
Wang, Wei ;
He, Yongjun ;
Han, Jiqing .
DIGITAL SIGNAL PROCESSING, 2019, 89 :104-115
[9]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[10]  
Harb H., 2003, IEEE INT C MULTIMEDI, V1, P733, DOI DOI 10.1109/ICME.2003.1221721