The processing of intimately familiar and unfamiliar voices: Specific neural responses of speaker recognition and identification

被引：9

作者：

Plante-Hebert, Julien ^{[1
]}

Boucher, Victor J. ^{[1
]}

Jemel, Boutheina ^{[2
,3
]}

机构：

[1] Univ Montreal, Dept Linguist & Traduct, Lab Sci Phonet, Montreal, PQ, Canada

[2] Hop Riviere Prairies, Lab Rech Neurosci & Electrophysiol Cognit, Montreal, PQ, Canada

[3] Univ Montreal, Fac Med, Ecole Orthophonie & Audiol, Montreal, PQ, Canada

来源：

PLOS ONE | 2021年 / 16卷 / 04期

关键词：

ONLY EXPERIENCES; EPISODIC MEMORY; TERM-MEMORY; DISCRIMINATION; RECOLLECTION; IDENTITY; PEOPLE; FACES; ERP; PHONAGNOSIA;

D O I：

10.1371/journal.pone.0250214

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Research has repeatedly shown that familiar and unfamiliar voices elicit different neural responses. But it has also been suggested that different neural correlates associate with the feeling of having heard a voice and knowing who the voice represents. The terminology used to designate these varying responses remains vague, creating a degree of confusion in the literature. Additionally, terms serving to designate tasks of voice discrimination, voice recognition, and speaker identification are often inconsistent creating further ambiguities. The present study used event-related potentials (ERPs) to clarify the difference between responses to 1) unknown voices, 2) trained-to-familiar voices as speech stimuli are repeatedly presented, and 3) intimately familiar voices. In an experiment, 13 participants listened to repeated utterances recorded from 12 speakers. Only one of the 12 voices was intimately familiar to a participant, whereas the remaining 11 voices were unfamiliar. The frequency of presentation of these 11 unfamiliar voices varied with only one being frequently presented (the trained-to-familiar voice). ERP analyses revealed different responses for intimately familiar and unfamiliar voices in two distinct time windows (P2 between 200-250 ms and a late positive component, LPC, between 450-850 ms post-onset) with late responses occurring only for intimately familiar voices. The LPC present sustained shifts, and short-time ERP components appear to reflect an early recognition stage. The trained voice equally elicited distinct responses, compared to rarely heard voices, but these occurred in a third time window (N250 between 300-350 ms post-onset). Overall, the timing of responses suggests that the processing of intimately familiar voices operates in two distinct steps of voice recognition, marked by a P2 on right centro-frontal sites, and speaker identification marked by an LPC component. The recognition of frequently heard voices entails an independent recognition process marked by a differential N250. Based on the present results and previous observations, it is proposed that there is a need to distinguish between processes of voice "recognition" and "identification". The present study also specifies test conditions serving to reveal this distinction in neural responses, one of which bears on the length of speech stimuli given the late responses associated with voice identification.

引用

页数：20

共 45 条

[31] Speaker identification using neural networks and wavelets - Multiresolution decomposition and pattern-recognition techniques enable identification in noisy environments
Phan, F
Micheli-Tzanakou, E
Sideman, S
IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, 2000, 19 (01): : 92 - 101
[32] Online signature recognition and writer identification by spatial-temporal neural processing
Baig, AR
Hussain, M
INMIC 2004: 8th International Multitopic Conference, Proceedings, 2004, : 381 - 385
[33] Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks
Nair, Athulya M. Swamidasan Unni
Savithri, Sathidevi P.
TRAITEMENT DU SIGNAL, 2021, 38 (01) : 221 - 230
[34] Enhanced neural responses in specific phases of reward processing in individuals with Internet gaming disorder
Wang, Lingxiao
Yang, Guochun
Zheng, Ya
Li, Zhenghan
Qi, Yue
Li, Qi
Liu, Xun
JOURNAL OF BEHAVIORAL ADDICTIONS, 2021, 10 (01) : 99 - 111
[35] Gammatonegram representation for end-to-end dysarthric speech processing tasks: speech recognition, speaker identification, and intelligibility assessment
Aref Farhadipour
Hadi Veisi
Iran Journal of Computer Science, 2024, 7 (2) : 311 - 324
[36] HYDROGEN DETECTION WITH A GAS SENSOR ARRAY - PROCESSING AND RECOGNITION OF DYNAMIC RESPONSES USING NEURAL NETWORKS
Gwizdz, Patryk
Brudnik, Andrzej
Zakrzewska, Katarzyna
METROLOGY AND MEASUREMENT SYSTEMS, 2015, 22 (01) : 3 - 12
[37] Song Recognition Learning and Stimulus-Specific Weakening of Neural Responses in the Avian Auditory Forebrain
Thompson, Jason V.
Gentner, Timothy Q.
JOURNAL OF NEUROPHYSIOLOGY, 2010, 103 (04) : 1785 - 1797
[38] Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
Aditya Arie Nugraha
Kazumasa Yamamoto
Seiichi Nakagawa
EURASIP Journal on Audio, Speech, and Music Processing, 2014
[39] Identification of Food Spoilage in the Smart Home based on Neural and Fuzzy Processing of Odour Sensor Responses
Green, Geoffrey C.
Chan, Adrian D. C.
Goubran, Rafik A.
2009 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-20, 2009, : 2625 - 2628
[40] Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
Nugraha, Aditya Arie
Yamamoto, Kazumasa
Nakagawa, Seiichi
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,

← 1 2 3 4 5 →