The processing of intimately familiar and unfamiliar voices: Specific neural responses of speaker recognition and identification

被引:9
|
作者
Plante-Hebert, Julien [1 ]
Boucher, Victor J. [1 ]
Jemel, Boutheina [2 ,3 ]
机构
[1] Univ Montreal, Dept Linguist & Traduct, Lab Sci Phonet, Montreal, PQ, Canada
[2] Hop Riviere Prairies, Lab Rech Neurosci & Electrophysiol Cognit, Montreal, PQ, Canada
[3] Univ Montreal, Fac Med, Ecole Orthophonie & Audiol, Montreal, PQ, Canada
来源
PLOS ONE | 2021年 / 16卷 / 04期
关键词
ONLY EXPERIENCES; EPISODIC MEMORY; TERM-MEMORY; DISCRIMINATION; RECOLLECTION; IDENTITY; PEOPLE; FACES; ERP; PHONAGNOSIA;
D O I
10.1371/journal.pone.0250214
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Research has repeatedly shown that familiar and unfamiliar voices elicit different neural responses. But it has also been suggested that different neural correlates associate with the feeling of having heard a voice and knowing who the voice represents. The terminology used to designate these varying responses remains vague, creating a degree of confusion in the literature. Additionally, terms serving to designate tasks of voice discrimination, voice recognition, and speaker identification are often inconsistent creating further ambiguities. The present study used event-related potentials (ERPs) to clarify the difference between responses to 1) unknown voices, 2) trained-to-familiar voices as speech stimuli are repeatedly presented, and 3) intimately familiar voices. In an experiment, 13 participants listened to repeated utterances recorded from 12 speakers. Only one of the 12 voices was intimately familiar to a participant, whereas the remaining 11 voices were unfamiliar. The frequency of presentation of these 11 unfamiliar voices varied with only one being frequently presented (the trained-to-familiar voice). ERP analyses revealed different responses for intimately familiar and unfamiliar voices in two distinct time windows (P2 between 200-250 ms and a late positive component, LPC, between 450-850 ms post-onset) with late responses occurring only for intimately familiar voices. The LPC present sustained shifts, and short-time ERP components appear to reflect an early recognition stage. The trained voice equally elicited distinct responses, compared to rarely heard voices, but these occurred in a third time window (N250 between 300-350 ms post-onset). Overall, the timing of responses suggests that the processing of intimately familiar voices operates in two distinct steps of voice recognition, marked by a P2 on right centro-frontal sites, and speaker identification marked by an LPC component. The recognition of frequently heard voices entails an independent recognition process marked by a differential N250. Based on the present results and previous observations, it is proposed that there is a need to distinguish between processes of voice "recognition" and "identification". The present study also specifies test conditions serving to reveal this distinction in neural responses, one of which bears on the length of speech stimuli given the late responses associated with voice identification.
引用
收藏
页数:20
相关论文
共 45 条
  • [31] Speaker identification using neural networks and wavelets - Multiresolution decomposition and pattern-recognition techniques enable identification in noisy environments
    Phan, F
    Micheli-Tzanakou, E
    Sideman, S
    IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, 2000, 19 (01): : 92 - 101
  • [32] Online signature recognition and writer identification by spatial-temporal neural processing
    Baig, AR
    Hussain, M
    INMIC 2004: 8th International Multitopic Conference, Proceedings, 2004, : 381 - 385
  • [33] Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks
    Nair, Athulya M. Swamidasan Unni
    Savithri, Sathidevi P.
    TRAITEMENT DU SIGNAL, 2021, 38 (01) : 221 - 230
  • [34] Enhanced neural responses in specific phases of reward processing in individuals with Internet gaming disorder
    Wang, Lingxiao
    Yang, Guochun
    Zheng, Ya
    Li, Zhenghan
    Qi, Yue
    Li, Qi
    Liu, Xun
    JOURNAL OF BEHAVIORAL ADDICTIONS, 2021, 10 (01) : 99 - 111
  • [35] Gammatonegram representation for end-to-end dysarthric speech processing tasks: speech recognition, speaker identification, and intelligibility assessment
    Aref Farhadipour
    Hadi Veisi
    Iran Journal of Computer Science, 2024, 7 (2) : 311 - 324
  • [36] HYDROGEN DETECTION WITH A GAS SENSOR ARRAY - PROCESSING AND RECOGNITION OF DYNAMIC RESPONSES USING NEURAL NETWORKS
    Gwizdz, Patryk
    Brudnik, Andrzej
    Zakrzewska, Katarzyna
    METROLOGY AND MEASUREMENT SYSTEMS, 2015, 22 (01) : 3 - 12
  • [37] Song Recognition Learning and Stimulus-Specific Weakening of Neural Responses in the Avian Auditory Forebrain
    Thompson, Jason V.
    Gentner, Timothy Q.
    JOURNAL OF NEUROPHYSIOLOGY, 2010, 103 (04) : 1785 - 1797
  • [38] Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
    Aditya Arie Nugraha
    Kazumasa Yamamoto
    Seiichi Nakagawa
    EURASIP Journal on Audio, Speech, and Music Processing, 2014
  • [39] Identification of Food Spoilage in the Smart Home based on Neural and Fuzzy Processing of Odour Sensor Responses
    Green, Geoffrey C.
    Chan, Adrian D. C.
    Goubran, Rafik A.
    2009 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-20, 2009, : 2625 - 2628
  • [40] Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
    Nugraha, Aditya Arie
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,