Speaker Identification through Spectral Entropy Analysis

被引：0

作者：

Camarena-Ibarrola, Antonio ^{[1
]}

Luque, Fernando ^{[2
]}

Chavez, Edgar ^{[2
]}

机构：

[1] Univ Michoacana, Morelia, Michoacan, Mexico

[2] Ctr Invest Cient & Educ Super Ensenada, Ensenada, Baja California, Mexico

来源：

2017 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC) | 2017年

关键词：

SPEECH; RECOGNITION;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Identifying a speaker in a noisy environment is still a challenging problem. In spite of the impressive efficacy of deep architectures, the solution obtained is an obscure mapping, a black box. For transparent classifiers, the standard feature are the Mel-Frequency Cepstral Coefficients (MFCC). In this paper we build entropy vectors out of the first sixteen critical bands according to Bark's scale as features. The classifier consists in finding the closest vector sequences of the query in the database, counting the hits as in a k-nn classifier. In one case we use the MFCC (the state of the art) and in the other case we use the described entropy vectors. For the text-independent, we used gaped vector sequences which are discussed below in the paper. For text-dependent speaker identification we obtained 80% true positives @10% false positives, while the MFCC have about 60% true positives at the same false positive rate. This is also about the same for the text-independent case. In general we obtained more area under the respective ROC curves.

引用

页数：6

共 20 条

[1] Speaker Diarization: A Review of Recent Research [J].

Anguera Miro, Xavier ;

Bozonnet, Simon ;

Evans, Nicholas ;

Fredouille, Corinne ;

Friedland, Gerald ;

Vinyals, Oriol .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :356-370

[2]

[Anonymous], 2009, COMPUT ENG DES

[3] Subband architecture for automatic speaker recognition [J].

Besacier, L ;

Bonastre, JF .

SIGNAL PROCESSING, 2000, 80 (07) :1245-1259

[4]

Brown R.A., 2014, BUILDING BALANCED K

[5] Automated speech analysis applied to laryngeal disease categorization [J].

Gelzinis, A. ;

Verikas, A. ;

Bacauskiene, M. .

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2008, 91 (01) :36-47

[6] PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].

HERMANSKY, H .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752

[7]

Lieberman P., 1977, SPEECH PHYSL ACOUSTI

[8]

Misra H, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P193

[9]

Mohammad-Djafari A., 1994, TRAIT SIGNAL, P87

[10] Modeling of the glottal flow derivative waveform with application to speaker identification [J].

Plumpe, MD ;

Quatieri, TF ;

Reynolds, DA .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (05) :569-586

← 1 2 →