Familiar and unfamiliar speaker recognition assessment and system emulation for cochlear implant users

被引:1
作者
Mamun, Nursadul [1 ]
Ghosh, Ria [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst CRSS CILab, Cochlear Implant Proc Lab, DAllas, TX 75080 USA
基金
美国国家卫生研究院;
关键词
SPEECH RECOGNITION; IDENTIFICATION; STRATEGIES; GENDER;
D O I
10.1121/10.0017216
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the area of speech processing, human speaker identification under naturalistic environments is a challenging task, especially for hearing-impaired individuals with cochlear implants (CIs) or hearing aids (HAs). Motivated by the fact that electrodograms reflect direct CI stimulation of input audio, this study proposes a speaker identification (ID) investigation using two-dimensional electrodograms constructed from the responses of a CI auditory system to emulate CI speaker ID capabilities. Features are extracted from electrodograms through an identity vector (i-vector) framework to train and generate identity models for each speaker using a Gaussian mixture model-universal background model followed by probabilistic linear discriminant analysis. To validate the proposed system, perceptual speaker ID for 20 normal hearing (NH) and seven CI listeners was evaluated with a total of 41 different speakers and compared with the scores from the proposed system. A one-way analysis of variance showed that the proposed system can reliably predict the speaker ID capability of CI (F[1,10] = 0.18, p = 0.68) and NH (F[1,20] = 0, p = 0.98) listeners in naturalistic environments. The impact of speaker familiarity is also addressed, and the results show a reduced performance for speaker recognition by CI subjects using their CI processor, highlighting limitations of current speech processing strategies used in CIs/HAs.
引用
收藏
页码:1293 / 1306
页数:14
相关论文
共 41 条
[1]  
Ali H., 2018, The Journal of the Acoustical Society of America, V144, P1872, DOI [10.1121/1.5068238, DOI 10.1121/1.5068238]
[2]  
Arndt P. L., 1999, C IMPLANTABLE AUDITO
[3]   Understanding Voice Perception [J].
Belin, Pascal ;
Bestelmeyer, Patricia E. G. ;
Latinus, Marianne ;
Watson, Rebecca .
BRITISH JOURNAL OF PSYCHOLOGY, 2011, 102 :711-725
[4]  
Brookes M., 2011, VOICEBOX: Speech processing toolbox for MATLAB
[5]  
CAMPBELL JP, 1995, INT CONF ACOUST SPEE, P341, DOI 10.1109/ICASSP.1995.479543
[6]   GENDER RECOGNITION FROM SPEECH .2. FINE ANALYSIS [J].
CHILDERS, DG ;
WU, K .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1991, 90 (04) :1841-1856
[7]  
CLARK GM, 1986, OTOLARYNG CLIN N AM, V19, P329
[8]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[9]   The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective [J].
Doddington, GR ;
Przybocki, MA ;
Martin, AF ;
Reynolds, DA .
SPEECH COMMUNICATION, 2000, 31 (2-3) :225-254
[10]   Perceiving the sex and identity of a talker without natural vocal timbre [J].
Fellowes, JM ;
Remez, RE ;
Rubin, PE .
PERCEPTION & PSYCHOPHYSICS, 1997, 59 (06) :839-849