Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs

被引：30

作者：

ten Oever, Sanne ^{[1
]}

Sack, Alexander T. ^{[1
]}

Wheat, Katherine L. ^{[1
]}

Bien, Nina ^{[1
,2
]}

van Atteveldt, Nienke ^{[1
,3
]}

机构：

[1] Maastricht Univ, Fac Psychol & Neurosci, NL-6200 MD Maastricht, Netherlands

[2] Univ Luxembourg, EMACS Res Unit, Luxembourg, Luxembourg

[3] Netherlands Inst Neurosci, Neuroimaging & Neuromodeling Grp, Amsterdam, Netherlands

来源：

FRONTIERS IN PSYCHOLOGY | 2013年 / 4卷

关键词：

audiovisual; temporal cues; audio-visual onset differences; content cues; predictability; detection; MULTISENSORY INTEGRATION; VISUAL SPEECH; CROSSMODAL BINDING; NEURONAL OSCILLATIONS; AUDITORY-CORTEX; PERCEPTION; SOUNDS; MODULATION; SYNCHRONY; VOICES;

D O I：

10.3389/fpsyg.2013.00331

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.

引用

页数：13

共 67 条

[1] Factors influencing audiovisual fission and fusion illusions [J].

Andersen, TS ;

Tiippana, K ;

Sams, M .

COGNITIVE BRAIN RESEARCH, 2004, 21 (03) :301-308

[2] Transitions in neural oscillations reflect prediction errors generated in audiovisual speech [J].

Arnal, Luc H. ;

Wyart, Valentin ;

Giraud, Anne-Lise .

NATURE NEUROSCIENCE, 2011, 14 (06) :797-U164

[3] Enhanced visual speech perception in individuals with early-onset hearing impairment [J].

Auer, Edward T., Jr. ;

Bernstein, Lynne E. .

JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2007, 50 (05) :1157-1165

[4] Speechreading and the structure of the lexicon: Computationally modeling the effects of reduced phonetic distinctiveness on lexical uniqueness [J].

Auer, ET ;

Bernstein, LE .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 102 (06) :3704-3710

[5] Integration of auditory and visual information about objects in superior temporal sulcus [J].

Beauchamp, MS ;

Lee, KE ;

Argall, BD ;

Martin, A .

NEURON, 2004, 41 (05) :809-823

[6] Auditory speech detection in noise enhanced by lipreading [J].

Bernstein, LE ;

Auer, ET ;

Takayanagi, S .

SPEECH COMMUNICATION, 2004, 44 (1-4) :5-18

[7] The sound of size Crossmodal binding in pitch-size synesthesia: A combined TMS, EEG and psychophysics study [J].

Bien, Nina ;

ten Oever, Sanne ;

Goebel, Rainer ;

Sack, Alexander T. .

NEUROIMAGE, 2012, 59 (01) :663-672

[8] Task-irrelevant visual letters interact with the processing of speech sounds in heteromodal and unimodal cortex [J].

Blau, Vera ;

van Atteveldt, Nienke ;

Formisano, Elia ;

Goebel, Rainer ;

Blomert, Leo .

EUROPEAN JOURNAL OF NEUROSCIENCE, 2008, 28 (03) :500-509

[9] Multisensory integration sites identified by perception of spatial wavelet filtered visual speech gesture information [J].

Callan, DE ;

Jones, JA ;

Munhall, K ;

Kroos, C ;

Callan, AM ;

Vatikiotis-Bateson, E .

JOURNAL OF COGNITIVE NEUROSCIENCE, 2004, 16 (05) :805-816

[10]

Calvert G.A., 2004, HDB MULTISENSORY PRO, P483

← 1 2 3 4 5 6 7 →