Simulation of talking faces in the human brain improves auditory speech recognition

被引：115

作者：

von Kriegstein, Katharina ^{[1
,2
]}

Dogan, Oezguer ^{[3
]}

Grueter, Martina ^{[4
]}

Giraud, Anne-Lise ^{[5
]}

Kell, Christian A. ^{[3
,5
]}

Grueter, Thomas ^{[4
]}

Kleinschmidt, Andreas ^{[6
,7
]}

Kiebel, Stefan J. ^{[1
]}

机构：

[1] UCL, Wellcome Trust Ctr Neuroimaging, London WC1N 3BG, England

[2] Univ Newcastle, Sch Med, Newcastle Upon Tyne NE2 4HH, Tyne & Wear, England

[3] Goethe Univ Frankfurt, Dept Neurol, D-60528 Frankfurt, Germany

[4] Univ Vienna, Dept Psychol Basic Res, A-1010 Vienna, Austria

[5] Ecole Normale Super, Dept Etud Cognit, F-75005 Paris, France

[6] CEA, NeuroSpin, F-91401 Gif Sur Yvette, France

[7] Inst Natl Sante & Rech Med, F-91401 Gif Sur Yvette, France

来源：

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA | 2008年 / 105卷 / 18期

基金：

英国惠康基金;

关键词：

fMRI; multisensory; predictive coding; prosopagnosia;

D O I：

10.1073/pnas.0710826105

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face.

引用

页码：6747 / 6752

页数：6

共 43 条

[1] Shape conveyed by visual-to-auditory sensory substitution activates the lateral occipital complex [J].

Amedi, Amir ;

Stern, William M. ;

Camprodon, Joan A. ;

Bermpohl, Felix ;

Merabet, Lotfi ;

Rotman, Stephen ;

Hemond, Christopher ;

Meijer, Peter ;

Pascual-Leone, Alvaro .

NATURE NEUROSCIENCE, 2007, 10 (06) :687-689

[2] PARALLEL VISUAL COMPUTATION [J].

BALLARD, DH ;

HINTON, GE ;

SEJNOWSKI, TJ .

NATURE, 1983, 306 (5938) :21-26

[3] The proactive brain: using analogies and associations to generate predictions [J].

Bar, Moshe .

TRENDS IN COGNITIVE SCIENCES, 2007, 11 (07) :280-289

[4] Thinking the voice:: neural correlates of voice perception [J].

Belin, P ;

Fecteau, S ;

Bédard, C .

TRENDS IN COGNITIVE SCIENCES, 2004, 8 (03) :129-135

[5] CROSSMODAL INTEGRATION IN THE IDENTIFICATION OF CONSONANT SEGMENTS [J].

BRAIDA, LD .

QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION A-HUMAN EXPERIMENTAL PSYCHOLOGY, 1991, 43 (03) :647-677

[6] Activation of auditory cortex during silent lipreading [J].

Calvert, GA ;

Bullmore, ET ;

Brammer, MJ ;

Campbell, R ;

Williams, SCR ;

McGuire, PK ;

Woodruff, PWR ;

Iverson, SD ;

David, AS .

SCIENCE, 1997, 276 (5312) :593-596

[7] Faces as objects of non-expertise:: Processing of thatcherised faces in congenital prosopagnosia [J].

Carbon, Claus-Christian ;

Grueter, Thomas ;

Weber, Joachim E. ;

Lueschow, Andreas .

PERCEPTION, 2007, 36 (11) :1635-1645

[8] Images, frames, and connectionist hierarchies [J].

Dayan, Peter .

NEURAL COMPUTATION, 2006, 18 (10) :2293-2319

[9] Fear recognition in the voice is modulated by unconsciously recognized facial expressions but not by unconsciously recognized affective pictures [J].

de Gelder, B ;

Pourtois, G ;

Weiskrantz, L .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (06) :4121-4126

[10] Optimal sensorimotor integration in recurrent cortical networks:: A neural implementation of Kalman filters [J].

Deneve, Sophie ;

Duhamel, Jean-Rene ;

Pouget, Alexandre .

JOURNAL OF NEUROSCIENCE, 2007, 27 (21) :5744-5756

← 1 2 3 4 5 →