Timing in audiovisual speech perception: A mini review and new psychophysical data

被引：17

作者：

Venezia, Jonathan H. ^{[1
]}

Thurman, Steven M. ^{[2
]}

Matchin, William ^{[3
]}

George, Sahara E. ^{[4
]}

Hickok, Gregory ^{[1
]}

机构：

[1] Univ Calif Irvine, Dept Cognit Sci, Irvine, CA 92697 USA

[2] Univ Calif Los Angeles, Dept Psychol, Los Angeles, CA USA

[3] Univ Maryland, Dept Linguist, Baltimore, MD 21201 USA

[4] Univ Calif Irvine, Dept Anat & Neurobiol, Irvine, CA 92717 USA

来源：

ATTENTION PERCEPTION & PSYCHOPHYSICS | 2016年 / 78卷 / 02期

关键词：

Audiovisual speech; Multisensory integration; Prediction; Classification image; Timing; McGurk; Speech kinematics; MULTISENSORY INTEGRATION; VISUAL SPEECH; SPATIOTEMPORAL DYNAMICS; SUPERIOR COLLICULUS; MOVEMENT VELOCITY; TEMPORAL WINDOW; RECOGNITION; INFORMATION; TIME; SYNCHRONY;

D O I：

10.3758/s13414-015-1026-y

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (similar to 35 % identification of /apa/ compared to similar to 5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (similar to 130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.

引用

页码：583 / 601

页数：19

共 112 条

[51]

Kleiner M, 2007, PERCEPTION, V36, P14

[52] ARTICULATORY ORGANIZATION OF MANDIBULAR, LABIAL, AND VELAR MOVEMENTS DURING SPEECH [J].

KOLLIA, HB ;

GRACCO, VL ;

HARRIS, KS .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1995, 98 (03) :1313-1324

[53] Investigating the impact of lip visibility and talking style on speechreading performance [J].

Lander, Karen ;

Capek, Cheryl .

SPEECH COMMUNICATION, 2013, 55 (05) :600-605

[54] Control of oral closure in lingual stop consonant production [J].

Löfqvist, A ;

Gracco, VL .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 111 (06) :2811-2827

[55]

Löfqvist A, 1999, J ACOUST SOC AM, V105, P1864, DOI 10.1121/1.426723

[56] Auditory Cortex Tracks Both Auditory and Visual Stimulus Dynamics Using Low-Frequency Neuronal Phase Modulation [J].

Luo, Huan ;

Liu, Zuxiang ;

Poeppel, David .

PLOS BIOLOGY, 2010, 8 (08) :25-26

[57]

MACLEOD A, 1987, British Journal of Audiology, V21, P131, DOI 10.3109/03005368709077786

[58] Causal inference of asynchronous audiovisual speech [J].

Magnotti, John F. ;

Ma, Wei Ji ;

Beauchamp, Michael S. .

FRONTIERS IN PSYCHOLOGY, 2013, 4

[59] Audiovisual Asynchrony Detection in Human Speech [J].

Maier, Joost X. ;

Di Luca, Massimiliano ;

Noppeney, Uta .

JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2011, 37 (01) :245-256

[60]

Massaro D.W., 1987, Speech perception by ear and eye: A paradigm for psychological inquiry

← 1 2 3 4 5 6 7 8 9 10 →