Familiar and unfamiliar speaker recognition assessment and system emulation for cochlear implant users

被引：1

作者：

Mamun, Nursadul ^{[1
]}

Ghosh, Ria ^{[1
]}

Hansen, John H. L. ^{[1
]}

机构：

[1] Univ Texas Dallas, Ctr Robust Speech Syst CRSS CILab, Cochlear Implant Proc Lab, DAllas, TX 75080 USA

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2023年 / 153卷 / 02期

基金：

美国国家卫生研究院;

关键词：

SPEECH RECOGNITION; IDENTIFICATION; STRATEGIES; GENDER;

D O I：

10.1121/10.0017216

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In the area of speech processing, human speaker identification under naturalistic environments is a challenging task, especially for hearing-impaired individuals with cochlear implants (CIs) or hearing aids (HAs). Motivated by the fact that electrodograms reflect direct CI stimulation of input audio, this study proposes a speaker identification (ID) investigation using two-dimensional electrodograms constructed from the responses of a CI auditory system to emulate CI speaker ID capabilities. Features are extracted from electrodograms through an identity vector (i-vector) framework to train and generate identity models for each speaker using a Gaussian mixture model-universal background model followed by probabilistic linear discriminant analysis. To validate the proposed system, perceptual speaker ID for 20 normal hearing (NH) and seven CI listeners was evaluated with a total of 41 different speakers and compared with the scores from the proposed system. A one-way analysis of variance showed that the proposed system can reliably predict the speaker ID capability of CI (F[1,10] = 0.18, p = 0.68) and NH (F[1,20] = 0, p = 0.98) listeners in naturalistic environments. The impact of speaker familiarity is also addressed, and the results show a reduced performance for speaker recognition by CI subjects using their CI processor, highlighting limitations of current speech processing strategies used in CIs/HAs.

引用

页码：1293 / 1306

页数：14

共 41 条

[1]

Ali H., 2018, The Journal of the Acoustical Society of America, V144, P1872, DOI [10.1121/1.5068238, DOI 10.1121/1.5068238]

[2]

Arndt P. L., 1999, C IMPLANTABLE AUDITO

[3] Understanding Voice Perception [J].

Belin, Pascal ;

Bestelmeyer, Patricia E. G. ;

Latinus, Marianne ;

Watson, Rebecca .

BRITISH JOURNAL OF PSYCHOLOGY, 2011, 102 :711-725

[4]

Brookes M., 2011, VOICEBOX: Speech processing toolbox for MATLAB

[5]

CAMPBELL JP, 1995, INT CONF ACOUST SPEE, P341, DOI 10.1109/ICASSP.1995.479543

[6] GENDER RECOGNITION FROM SPEECH .2. FINE ANALYSIS [J].

CHILDERS, DG ;

WU, K .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1991, 90 (04) :1841-1856

[7]

CLARK GM, 1986, OTOLARYNG CLIN N AM, V19, P329

[8] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[9] The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective [J].

Doddington, GR ;

Przybocki, MA ;

Martin, AF ;

Reynolds, DA .

SPEECH COMMUNICATION, 2000, 31 (2-3) :225-254

[10] Perceiving the sex and identity of a talker without natural vocal timbre [J].

Fellowes, JM ;

Remez, RE ;

Rubin, PE .

PERCEPTION & PSYCHOPHYSICS, 1997, 59 (06) :839-849

← 1 2 3 4 5 →