Estimation of age from speech using excitation source features

被引:2
作者
Avikal, Shwetank [1 ]
Sharma, Kritika [2 ]
Barthwal, Anuragh [3 ]
Kumar, K. C. Nithin [4 ]
Badhotiya, Gaurav Kumar [4 ]
机构
[1] Graph Era Hill Univ, Mech Dept, Dehra Dun, Uttarakhand, India
[2] GL Bajaj Inst Technol & Management, Comp Sci, Greater Noida, India
[3] Shiv Nadar Univ, Comp Sci & Engn, Greater Noida, India
[4] Graph Era Deemed Univ, Mech Engn Dept, Dehra Dun, Uttarakhand, India
关键词
Linear prediction cepstral coefficients; Age approximation; Excitation source features; Text independent age estimation; RECOGNITION;
D O I
10.1016/j.matpr.2021.02.159
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Extraction of various distinct features from speech, Gaussian mixture models (GMMs) have been employed as classifiers. Age estimation performance has been inferred by employing excitation source features (LPCCs). The detected age of a speaker is deemed to belong to any of the 9 age groups of 5-10, 10-15, 15-20, 20-25, 25-30, 30-35, 35-40, 40-45 and 45-50. An age group speech corpus has been collected in this work by recording voice of speakers of Hindi language of different age groups and dialects using ten text prompts in neutral speech. Textually neutral Hindi words have been used to construct text prompts which have been recorded in neutral emotion. Different age group has been characterized by using these texts prompt. The average age performance of multispeaker (male + female) is around 94%. In this research, classification of different group of speakers has been done on the basis of excitation source in human speech. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页码:11046 / 11049
页数:4
相关论文
共 21 条
[1]  
[Anonymous], 1993, Fundamentals of Speech Recognition
[2]  
Chauhan R, 2011, COMM COM INF SC, V168, P359
[3]   Age Estimation in Foreign-accented Speech by Native and Non-native Speakers [J].
Gnevsheva, Ksenia ;
Burkle, Daniel .
LANGUAGE AND SPEECH, 2020, 63 (01) :166-183
[4]   ICARUS - SOURCE GENERATOR BASED REAL-TIME RECOGNITION OF SPEECH IN NOISY STRESSFUL AND LOMBARD EFFECT ENVIRONMENTS [J].
HANSEN, JHL ;
CAIRNS, DA .
SPEECH COMMUNICATION, 1995, 16 (04) :391-422
[5]  
Ilyas M., 2020, SERIES BIOENGINEERIN, DOI [10.1007/978-981-13-0956-4_7, DOI 10.1007/978-981-13-0956-4_7]
[6]   Automatic speaker profiling from short duration speech data [J].
Kalluri, Shareef Babu ;
Vijayasenan, Deepu ;
Ganapathy, Sriram .
SPEECH COMMUNICATION, 2020, 121 :16-28
[7]   Emotion recognition from speech: a review [J].
Koolagudi, Shashidhar G. ;
Rao, K. Sreenivasa .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (02) :99-117
[8]   IITKGP-SESC: Speech Database for Emotion Analysis [J].
Koolagudi, Shashidhar G. ;
Maity, Sudhamay ;
Kumar, Vuppala Anil ;
Chakrabarti, Saswat ;
Rao, K. Sreenivasa .
CONTEMPORARY COMPUTING, PROCEEDINGS, 2009, 40 :485-+
[9]  
Koolagudi Sudhamay, 2009, CCIS
[10]   EFFECT OF VOCAL DISGUISE ON ESTIMATIONS OF SPEAKERS AGES [J].
LASS, NJ ;
JUSTICE, LA ;
GEORGE, BD ;
BALDWIN, LM ;
SCHERBICK, KA ;
WRIGHT, DL .
PERCEPTUAL AND MOTOR SKILLS, 1982, 54 (03) :1311-1315