Audio-visual speech scene analysis: Characterization of the dynamics of unbinding and rebinding the McGurk effect

被引:25
|
作者
Nahorna, Olha [1 ]
Berthommier, Frederic [1 ]
Schwartz, Jean-Luc [1 ]
机构
[1] Grenoble Univ, CNRS, Speech & Cognit Dept, GIPSA Lab,UMR 5216, Grenoble, France
基金
欧洲研究理事会;
关键词
VISUAL SPEECH; SPATIAL ATTENTION; AUDITORY SPEECH; BIMODAL SPEECH; PERCEPTION; INTEGRATION; INFORMATION; DECISIONS; VOICES; INTELLIGIBILITY;
D O I
10.1121/1.4904536
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While audiovisual interactions in speech perception have long been considered as automatic, recent data suggest that this is not the case. In a previous study, Nahorna et al. [(2012). J. Acoust. Soc. Am. 132, 1061-1077] showed that the McGurk effect is reduced by a previous incoherent audiovisual context. This was interpreted as showing the existence of an audiovisual binding stage controlling the fusion process. Incoherence would produce unbinding and decrease the weight of the visual input in fusion. The present paper explores the audiovisual binding system to characterize its dynamics. A first experiment assesses the dynamics of unbinding, and shows that it is rapid: An incoherent context less than 0.5 s long (typically one syllable) suffices to produce a maximal reduction in the McGurk effect. A second experiment tests the rebinding process, by presenting a short period of either coherent material or silence after the incoherent unbinding context. Coherence provides rebinding, with a recovery of the McGurk effect, while silence provides no rebinding and hence freezes the unbinding process. These experiments are interpreted in the framework of an audiovisual speech scene analysis process assessing the perceptual organization of an audiovisual speech input before decision takes place at a higher processing stage. (C) 2015 Acoustical Society of America.
引用
收藏
页码:362 / 377
页数:16
相关论文
共 50 条
  • [31] Audio-Visual Speech Timing Sensitivity Is Enhanced in Cluttered Conditions
    Roseboom, Warrick
    Nishida, Shin'ya
    Fujisaki, Waka
    Arnold, Derek H.
    PLOS ONE, 2011, 6 (04):
  • [32] Event-related potentials associated with somatosensory effect in audio-visual speech perception
    Ito, Takayuki
    Ohashi, Hiroki
    Montas, Eva
    Gracco, Vincent L.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 669 - 673
  • [33] Deep-learning-based audio-visual speech enhancement in presence of Lombard effect
    Michelsanti, Daniel
    Tan, Zheng-Hua
    Sigurdsson, Sigurdur
    Jensen, Jesper
    SPEECH COMMUNICATION, 2019, 115 : 38 - 50
  • [34] Infant perception of audio-visual speech synchrony in familiar and unfamiliar fluent speech
    Pons, Ferran
    Lewkowicz, David J.
    ACTA PSYCHOLOGICA, 2014, 149 : 142 - 147
  • [35] Audio-visual speech perception in prelingually deafened Japanese children following sequential bilateral cochlear implantation
    Yamamoto, Ryosuke
    Naito, Yasushi
    Tona, Risa
    Moroto, Saburo
    Tamaya, Rinko
    Fujiwara, Keizo
    Shinohara, Shogo
    Takebayashi, Shinji
    Kikuchi, Masahiro
    Michida, Tetsuhiko
    INTERNATIONAL JOURNAL OF PEDIATRIC OTORHINOLARYNGOLOGY, 2017, 102 : 160 - 168
  • [36] Neural dynamics driving audio-visual integration in autism
    Ronconi, Luca
    Vitale, Andrea
    Federici, Alessandra
    Mazzoni, Noemi
    Battaglini, Luca
    Molteni, Massimo
    Casartelli, Luca
    CEREBRAL CORTEX, 2023, 33 (03) : 543 - 556
  • [37] Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras
    Kim, Hansung
    Remaggi, Luca
    Dourado, Aloisio
    de Campos, Teofilo
    Jackson, Philip J. B.
    Hilton, Adrian
    VIRTUAL REALITY, 2022, 26 (03) : 823 - 838
  • [38] The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear
    Fecher, Natalie
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2247 - 2250
  • [39] Involvement of Right STS in Audio-Visual Integration for Affective Speech Demonstrated Using MEG
    Hagan, Cindy C.
    Woods, Will
    Johnson, Sam
    Green, Gary G. R.
    Young, Andrew W.
    PLOS ONE, 2013, 8 (08):
  • [40] Effects of hearing loss and audio-visual cues on children?s speech processing speed
    Holt, Rebecca
    Bruggeman, Laurence
    Demuth, Katherine
    SPEECH COMMUNICATION, 2023, 146 : 11 - 21