Multi-time resolution analysis of speech: evidence from psychophysics

被引:39
作者
Chait, Maria [1 ,2 ]
Greenberg, Steven [3 ]
Arai, Takayuki [4 ]
Simon, Jonathan Z. [1 ,5 ,6 ,7 ]
Poeppel, David [1 ,2 ,8 ,9 ]
机构
[1] Univ Maryland, Neurosci & Cognit Sci Program, College Pk, MD 20742 USA
[2] Univ Maryland, Dept Linguist, College Pk, MD 20742 USA
[3] Silicon Speech, Hidden Valley Lake, CA USA
[4] Sophia Univ, Dept Informat & Commun Sci, Tokyo 102, Japan
[5] Univ Maryland, Dept Biol, College Pk, MD 20742 USA
[6] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
[7] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA
[8] NYU, Dept Psychol, New York, NY 10003 USA
[9] Max Planck Inst, Dept Neurosci, Frankfurt, Germany
关键词
speech perception; speech segmentation; temporal processing; modulation spectrum; auditory processing; syllable; phoneme; MODULATION TRANSFER-FUNCTIONS; TIME-COMPRESSED SPEECH; AUDITORY-CORTEX; PERCEPTUAL ADJUSTMENT; TEMPORAL INTEGRATION; CORTICAL RESPONSES; ACOUSTIC LANDMARKS; NATURAL SOUNDS; COMPREHENSION; INTELLIGIBILITY;
D O I
10.3389/fnins.2015.00214
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
How speech signals are analyzed and represented remains a foundational challenge both for cognitive science and neuroscience. A growing body of research, employing various behavioral and neurobiological experimental techniques, now points to the perceptual relevance of both phoneme-sized (10-40 Hz modulation frequency) and syllable-sized (2-10 Hz modulation frequency) units in speech processing. However, it is not clear how information associated with such different time scales interacts in a manner relevant for speech perception. We report behavioral experiments on speech intelligibility employing a stimulus that allows us to investigate how distinct temporal modulations in speech are treated separately and whether they are combined. We created sentences in which the slow (similar to 4 Hz; S-low) and rapid (similar to 33 Hz; S-high) modulations-corresponding to similar to 250 and similar to 30 ms, the average duration of syllables and certain phonetic properties, respectively were selectively extracted. Although Slow and Shigh have low intelligibility when presented separately, dichotic presentation of Shigh with Slow results in supra-additive performance, suggesting a synergistic relationship between low- and high-modulation frequencies. A second experiment desynchronized presentation of the Slow and Shigh signals. Desynchronizing signals relative to one another had no impact on intelligibility when delays were less than 45 ms. Longer delays resulted in a steep intelligibility decline, providing further evidence of integration or binding of information within restricted temporal windows. Our data suggest that human speech perception uses multi-time resolution processing. Signals are concurrently analyzed on at least two separate time scales, the intermediate representations of these analyses are integrated, and the resulting bound percept has significant consequences for speech intelligibility a view compatible with recent insights from neuroscience implicating multi-timescale auditory processing.
引用
收藏
页数:10
相关论文
共 75 条
[1]   Speech comprehension is correlated with temporal response patterns recorded from auditory cortex [J].
Ahissar, E ;
Nagarajan, S ;
Ahissar, M ;
Protopapas, A ;
Mahncke, H ;
Merzenich, MM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (23) :13367-13372
[2]  
[Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225
[3]   Syllable intelligibility for temporally filtered LPC cepstral trajectories [J].
Arai, T ;
Pavel, M ;
Hermansky, H ;
Avendano, C .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 105 (05) :2783-2791
[4]  
BLAMEY P J, 1989, Journal of Rehabilitation Research and Development, V26, P15
[5]   Hierarchical and asymmetric temporal sensitivity in human auditory cortices [J].
Boemio, A ;
Fromm, S ;
Braun, A ;
Poeppel, D .
NATURE NEUROSCIENCE, 2005, 8 (03) :389-395
[6]   Spectro-temporal modulation transfer functions and speech intelligibility [J].
Chi, TS ;
Gao, YJ ;
Guyton, MC ;
Ru, PW ;
Shamma, S .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (05) :2719-2732
[7]   Hemispheric asymmetries in auditory temporal integration: A study of event-related potentials [J].
Clunies-Ross, Karen L. ;
Brydges, Christopher R. ;
Nguyen, An T. ;
Fox, Allison M. .
NEUROPSYCHOLOGIA, 2015, 68 :201-208
[8]  
Cutler A., 2012, Native listening
[9]   AUDITORY AND LINGUISTIC PROCESSES IN SPEECH-PERCEPTION - INFERENCES FROM 6 FUSIONS IN DICHOTIC-LISTENING [J].
CUTTING, JE .
PSYCHOLOGICAL REVIEW, 1976, 83 (02) :114-140
[10]   Lexical information drives; Perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences [J].
Davis, MH ;
Johnsrude, IS ;
Hervais-Adelman, A ;
Taylor, K ;
McGettigan, C .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2005, 134 (02) :222-241