Timing in audiovisual speech perception: A mini review and new psychophysical data

被引：17

作者：

Venezia, Jonathan H. ^{[1
]}

Thurman, Steven M. ^{[2
]}

Matchin, William ^{[3
]}

George, Sahara E. ^{[4
]}

Hickok, Gregory ^{[1
]}

机构：

[1] Univ Calif Irvine, Dept Cognit Sci, Irvine, CA 92697 USA

[2] Univ Calif Los Angeles, Dept Psychol, Los Angeles, CA USA

[3] Univ Maryland, Dept Linguist, Baltimore, MD 21201 USA

[4] Univ Calif Irvine, Dept Anat & Neurobiol, Irvine, CA 92717 USA

来源：

ATTENTION PERCEPTION & PSYCHOPHYSICS | 2016年 / 78卷 / 02期

关键词：

Audiovisual speech; Multisensory integration; Prediction; Classification image; Timing; McGurk; Speech kinematics; MULTISENSORY INTEGRATION; VISUAL SPEECH; SPATIOTEMPORAL DYNAMICS; SUPERIOR COLLICULUS; MOVEMENT VELOCITY; TEMPORAL WINDOW; RECOGNITION; INFORMATION; TIME; SYNCHRONY;

D O I：

10.3758/s13414-015-1026-y

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (similar to 35 % identification of /apa/ compared to similar to 5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (similar to 130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.

引用

页码：583 / 601

页数：19

共 50 条

[21] Spatial frequency requirements for audiovisual speech perception
K. G. Munhall
C. Kroos
G. Jozan
E. Vatikiotis-Bateson
Perception & Psychophysics, 2004, 66 : 574 - 583
[22] Increases in sensory noise predict attentional disruptions to audiovisual speech perception
Fisher, Victoria L.
Dean, Cassandra L.
Nave, Claire S.
Parkins, Emma V.
Kerkhoff, Willa G.
Kwakye, Leslie D.
FRONTIERS IN HUMAN NEUROSCIENCE, 2023, 16
[23] An assessment of behavioral dynamic information processing measures in audiovisual speech perception
Altieri, Nicholas
Townsend, James T.
FRONTIERS IN PSYCHOLOGY, 2011, 2
[24] Audiovisual Speech Perception Benefits are Stable from Preschool through Adolescence
Gijbels, Liesbeth
Yeatman, Jason D.
Lalonde, Kaylah
Doering, Piper
Lee, Adrian K. C.
MULTISENSORY RESEARCH, 2024, 37 (4-5) : 317 - 340
[25] Audiovisual Speech Perception and Eye Gaze Behavior of Adults with Asperger Syndrome
Satu Saalasti
Jari Kätsyri
Kaisa Tiippana
Mari Laine-Hernandez
Lennart von Wendt
Mikko Sams
Journal of Autism and Developmental Disorders, 2012, 42 : 1606 - 1615
[26] Effect of attentional load on audiovisual speech perception: evidence from ERPs
Alsius, Agnes
Moettoenen, Riikka
Sams, Mikko E.
Soto-Faraco, Salvador
Tiippana, Kaisa
FRONTIERS IN PSYCHOLOGY, 2014, 5
[27] Visual Timing Information in Audiovisual Speech Perception: Evidence from Lexical Tone Contour
Xie, Hui
Zeng, Biao
Wang, Rui
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3781 - 3785
[28] Modeling the Development of Audiovisual Cue Integration in Speech Perception
Getz, Laura M.
Nordeen, Elke R.
Vrabic, Sarah C.
Toscano, Joseph C.
BRAIN SCIENCES, 2017, 7 (03):
[29] Perception of Audiovisual Speech Produced by Human and Virtual Speaker
Aller, Sven
Meister, Einar
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2016, 289 : 31 - 38
[30] The impact of visual information in speech perception for individuals with hearing loss: a mini review
Choi, Ahyeon
Kim, Hayoon
Jo, Mina
Kim, Subeen
Joung, Haesun
Choi, Inyong
Lee, Kyogu
FRONTIERS IN PSYCHOLOGY, 2024, 15

← 1 2 3 4 5 →