Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech

被引：95

作者：

Daube, Christoph ^{[1
]}

Ince, Robin A. A. ^{[1
]}

Gross, Joachim ^{[1
,2
]}

机构：

[1] Univ Glasgow, Inst Neurosci & Psychol, 62 Hillhead St, Glasgow G12 8QB, Lanark, Scotland

[2] Univ Munster, Inst Biomagnetism & Biosignalanal, Malmedyweg 15, D-48149 Munster, Germany

来源：

CURRENT BIOLOGY | 2019年 / 29卷 / 12期

基金：

英国惠康基金;

关键词：

INFORMATION; DYNAMICS; COMPREHENSION; OSCILLATIONS; PRINCIPLES; FREQUENCY; ENVELOPE; MODELS; MEG;

D O I：

10.1016/j.cub.2019.04.067

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

When we listen to speech, we have to make sense of a waveform of sound pressure. Hierarchical models of speech perception assume that, to extract semantic meaning, the signal is transformed into unknown, intermediate neuronal representations. Traditionally, studies of such intermediate representations are guided by linguistically defined concepts, such as phonemes. Here, we argue that in order to arrive at an unbiased understanding of the neuronal responses to speech, we should focus instead on representations obtained directly from the stimulus. We illustrate our view with a data-driven, information theoretic analysis of a dataset of 24 young, healthy humans who listened to a 1 h narrative while their magnetoencephalogram (MEG) was recorded. We find that two recent results, the improved performance of an encoding model in which annotated linguistic and acoustic features were combined and the decoding of phoneme subgroups from phoneme-locked responses, can be explained by an encoding model that is based entirely on acoustic features. These acoustic features capitalize on acoustic edges and outperform Gabor-filtered spectrograms, which can explicitly describe the spectrotemporal characteristics of individual phonemes. By replicating our results in publicly available electroencephalography (EEG) data, we conclude that models of brain responses based on linguistic features can serve as excellent benchmarks. However, we believe that in order to further our understanding of human cortical responses to speech, we should also explore low-level and parsimonious explanations for apparent high-level phenomena.

引用

页码：1924 / +

页数：23

共 93 条

[1] Machine learning for neuroirnaging with scikit-learn [J].