Selective cortical representation of attended speaker in multi-talker speech perception

被引:673
作者
Mesgarani, Nima [1 ,2 ]
Chang, Edward F. [1 ,2 ]
机构
[1] Univ Calif San Francisco, UCSF Ctr Integrat Neurosci, Dept Neurol Surg, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, UCSF Ctr Integrat Neurosci, Dept Physiol, San Francisco, CA 94143 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1038/nature11020
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background(1-3). How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented(4,5). Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.
引用
收藏
页码:233 / U118
页数:5
相关论文
共 29 条
[1]   The cocktail party problem: What is it? How can it be solved? And why should animal behaviorists study it? [J].
Bee, Mark A. ;
Micheyl, Christophe .
JOURNAL OF COMPARATIVE PSYCHOLOGY, 2008, 122 (03) :235-251
[2]   Tuning of the Human Neocortex to the Temporal Dynamics of Attended Events [J].
Besle, Julien ;
Schevon, Catherine A. ;
Mehta, Ashesh D. ;
Lakatos, Peter ;
Goodman, Robert R. ;
McKhann, Guy M. ;
Emerson, Ronald G. ;
Schroeder, Charles E. .
JOURNAL OF NEUROSCIENCE, 2011, 31 (09) :3176-3185
[3]   READING A NEURAL CODE [J].
BIALEK, W ;
RIEKE, F ;
VANSTEVENINCK, RRD ;
WARLAND, D .
SCIENCE, 1991, 252 (5014) :1854-1857
[4]   A speech corpus for multitalker communications research [J].
Bolia, RS ;
Nelson, WT ;
Ericson, MA ;
Simpson, BD .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 107 (02) :1065-1066
[5]  
Bregman AS., 1994, AUDITORY SCENE ANAL
[6]   Informational and energetic masking effects in the perception of two simultaneous talkers [J].
Brungart, DS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (03) :1101-1109
[7]   Categorical speech representation in human superior temporal gyrus [J].
Chang, Edward F. ;
Rieger, Jochem W. ;
Johnson, Keith ;
Berger, Mitchel S. ;
Barbaro, Nicholas M. ;
Knight, Robert T. .
NATURE NEUROSCIENCE, 2010, 13 (11) :1428-U169
[9]   Monaural speech separation and recognition challenge [J].
Cooke, Martin ;
Hershey, John R. ;
Rennie, Steven J. .
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01) :1-15
[10]   Induced electrocorticographic gamma activity during auditory perception [J].
Crone, NE ;
Boatman, D ;
Gordon, B ;
Hao, L .
CLINICAL NEUROPHYSIOLOGY, 2001, 112 (04) :565-582