A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation

被引：11

作者：

Al-Qaderi, Mohammad ^{[1
]}

Lahamer, Elfituri ^{[2
]}

Rad, Ahmad ^{[2
]}

机构：

[1] Hashemite Univ, Fac Engn, Dept Mechatron Engn, POB 330127, Zarqa 13133, Jordan

[2] Simon Fraser Univ, Sch Mechatron Syst Engn, Autonomous & Intelligent Syst Lab, Surrey, BC V3T 0A3, Canada

来源：

SENSORS | 2021年 / 21卷 / 15期

关键词：

speaker recognition system; limited speech data; short utterances; social robots; social human-robot interaction; two-stage classifier; fuzzy fusion; SUPPORT VECTOR MACHINES; EXTRACTION METHODS; RECOGNITION; VERIFICATION; ROBOTS; MODEL;

D O I：

10.3390/s21155097

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

We present a new architecture to address the challenges of speaker identification that arise in interaction of humans with social robots. Though deep learning systems have led to impressive performance in many speech applications, limited speech data at training stage and short utterances with background noise at test stage present challenges and are still open problems as no optimum solution has been reported to date. The proposed design employs a generative model namely the Gaussian mixture model (GMM) and a discriminative model-support vector machine (SVM) classifiers as well as prosodic features and short-term spectral features to concurrently classify a speaker's gender and his/her identity. The proposed architecture works in a semi-sequential manner consisting of two stages: the first classifier exploits the prosodic features to determine the speaker's gender which in turn is used with the short-term spectral features as inputs to the second classifier system in order to identify the speaker. The second classifier system employs two types of short-term spectral features; namely mel-frequency cepstral coefficients (MFCC) and gammatone frequency cepstral coefficients (GFCC) as well as gender information as inputs to two different classifiers (GMM and GMM supervector-based SVM) which in total leads to construction of four classifiers. The outputs from the second stage classifiers; namely GMM-MFCC maximum likelihood classifier (MLC), GMM-GFCC MLC, GMM-MFCC supervector SVM, and GMM-GFCC supervector SVM are fused at score level by the weighted Borda count approach. The weight factors are computed on the fly via Mamdani fuzzy inference system that its inputs are the signal to noise ratio and the length of utterance. Experimental evaluations suggest that the proposed architecture and the fusion framework are promising and can improve the recognition performance of the system in challenging environments where the signal-to-noise ratio is low, and the length of utterance is short; such scenarios often arise in social robot interactions with humans.

引用

页数：30

共 67 条

[1]

Ahmad K.S., P ICAPR 2015 2015 8, DOI DOI 10.1109/ICAPR.2015.7050669

[2]

Al-Kaltakchi MTS, 2017, EUR SIGNAL PR CONF, P533, DOI 10.23919/EUSIPCO.2017.8081264

[3] Speaker Diarization: A Review of Recent Research [J].

Anguera Miro, Xavier ;

Bozonnet, Simon ;

Evans, Nicholas ;

Fredouille, Corinne ;

Friedland, Gerald ;

Vinyals, Oriol .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :356-370

[4] Speaker Model Clustering for Efficient Speaker Identification in Large Population Applications [J].

Apsingekar, Vijendra Raj ;

De Leon, Phillip L. .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04) :848-853

[5]

Bai Z., 2020, ARXIV201200931 ARXIV201200931

[6] Person Recognition Is Easier from Faces than from Voices [J].

Barsics, Catherine .

PSYCHOLOGICA BELGICA, 2014, 54 (03) :244-254

[7] Multi-modal classifier fusion with feature cooperation for glaucoma diagnosis [J].

Benzebouchi, Nacer Eddine ;

Azizi, Nabiha ;

Ashour, Amira S. ;

Dey, Nilanjan ;

Sherratt, R. Simon .

JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2019, 31 (06) :841-874

[8]

Boersma Paul, 2018, Glot Int.

[9] Support vector machines for speaker and language recognition [J].

Campbell, WM ;

Campbell, JP ;

Reynolds, DA ;

Singer, E ;

Torres-Carrasquillo, PA .

COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) :210-229

[10] Support vector machines using GMM supervectors for speaker verification [J].

Campbell, WM ;

Sturim, DE ;

Reynolds, DA .

IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) :308-311

← 1 2 3 4 5 6 7 →