Gender-based speaker recognition from speech signals using GMM model

被引：14

作者：

Gupta, Manish ^{[1
]}

Bhartit, Shambhu Shankar ^{[2
]}

Agarwal, Suneeta ^{[2
]}

机构：

[1] Motilal Nehru Natl Inst Technol Allahabad, Comp Sci & Engn Dept, Prayagraj 211004, Uttar Pradesh, India

[2] Motilal Nehru Natl Inst Technol Allahabad, Prayagraj 211004, Uttar Pradesh, India

来源：

MODERN PHYSICS LETTERS B | 2019年 / 33卷 / 35期

关键词：

Gender recognition; speaker recognition; GMM; SVM; MFCC; FEATURES;

D O I：

10.1142/S0217984919504384

中图分类号：

O59 [应用物理学];

学科分类号：

摘要：

Speech is a convenient medium for communication among human beings. Speaker recognition is a process of automatically recognizing the speaker by processing the information included in the speech signal. In this paper, a new approach is proposed for speaker recognition through speech signal. Here, a two-level approach is proposed. In the first-level, the gender of the speaker is recognized, and in the second-level speaker is recognized based on recognized gender at first-level. After recognizing the gender of the speaker, search space is reduced to half for the second-level as speaker recognition system searches only in a set of speech signals belonging to identified gender. To identify gender, gender-specific features: Mel Frequency Cepstral Coefficients (MFCC) and pitch are used. Speaker is recognized by using speaker specific features: MFCC, Pitch and RASTA-PLP. Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) classifiers are used for identifying the gender and recognizing the speaker, respectively. Experiments are performed on speech signals of two databases: "IIT-Madras speech synthesis and recognition" (containing speech samples spoken by eight male and eight female speakers of eight different regions in English language) and "ELSDSR" (containing speech samples spoken by five male and five female in English language). Experimentally, it is observed that by using two-level approach, time taken for speaker recognition is reduced by 30-32% as compared to the approach when speaker is recognized without identifying the gender (single-level approach). The accuracy of speaker recognition in this proposed approach is also improved from 99.7% to 99.9% as compared to single-level approach. It is concluded through the experiments that speech signal of a minimum 1.12 duration (after neglecting silence parts) is sufficient for recognizing the speaker.

引用

页数：23

共 25 条

[1]

Aldhaheri R. W., 2015, ICICS PCM, P1

[2] EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].

ATAL, BS .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) :1304-1312

[3]

Becchetti C., 1999, Speech recognition - Theory and C++ Implementation, P122

[4] SVM based Voice Activity Detection by fusing a new acoustic feature PLMS with some existing acoustic features of speech [J].

Bharti, Shambhu Shankar ;

Gupta, Manish ;

Agarwal, Suneeta .

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (02) :1519-1530

[5] Speaker recognition: A tutorial [J].

Campbell, JP .

PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462

[6] Robust Spectral Features for Automatic Speaker Recognition in Mismatch Condition [J].

Chougule, Sharada V. ;

Chavan, Mahesh S. .

SECOND INTERNATIONAL SYMPOSIUM ON COMPUTER VISION AND THE INTERNET (VISIONNET'15), 2015, 58 :272-279

[7] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[8] SPEAKER RECOGNITION - IDENTIFYING PEOPLE BY THEIR VOICES [J].

DODDINGTON, GR .

PROCEEDINGS OF THE IEEE, 1985, 73 (11) :1651-1664

[9]

Hansen J., 2000, DISCRETE TIME PROCES, VSecond

[10] RASTA Processing of Speech [J].

Hermansky, Hynek ;

Morgan, Nelson .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :578-589

← 1 2 3 →