Audio-visual speech scene analysis: Characterization of the dynamics of unbinding and rebinding the McGurk effect

被引:25
|
作者
Nahorna, Olha [1 ]
Berthommier, Frederic [1 ]
Schwartz, Jean-Luc [1 ]
机构
[1] Grenoble Univ, CNRS, Speech & Cognit Dept, GIPSA Lab,UMR 5216, Grenoble, France
基金
欧洲研究理事会;
关键词
VISUAL SPEECH; SPATIAL ATTENTION; AUDITORY SPEECH; BIMODAL SPEECH; PERCEPTION; INTEGRATION; INFORMATION; DECISIONS; VOICES; INTELLIGIBILITY;
D O I
10.1121/1.4904536
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While audiovisual interactions in speech perception have long been considered as automatic, recent data suggest that this is not the case. In a previous study, Nahorna et al. [(2012). J. Acoust. Soc. Am. 132, 1061-1077] showed that the McGurk effect is reduced by a previous incoherent audiovisual context. This was interpreted as showing the existence of an audiovisual binding stage controlling the fusion process. Incoherence would produce unbinding and decrease the weight of the visual input in fusion. The present paper explores the audiovisual binding system to characterize its dynamics. A first experiment assesses the dynamics of unbinding, and shows that it is rapid: An incoherent context less than 0.5 s long (typically one syllable) suffices to produce a maximal reduction in the McGurk effect. A second experiment tests the rebinding process, by presenting a short period of either coherent material or silence after the incoherent unbinding context. Coherence provides rebinding, with a recovery of the McGurk effect, while silence provides no rebinding and hence freezes the unbinding process. These experiments are interpreted in the framework of an audiovisual speech scene analysis process assessing the perceptual organization of an audiovisual speech input before decision takes place at a higher processing stage. (C) 2015 Acoustical Society of America.
引用
收藏
页码:362 / 377
页数:16
相关论文
共 50 条
  • [41] Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study
    Kumar, G. Vinodh
    Halder, Tamesh
    Jaiswal, Amit K.
    Mukherjee, Abhishek
    Roy, Dipanjan
    Banerjee, Arpan
    FRONTIERS IN PSYCHOLOGY, 2016, 7
  • [42] Audio-visual word prominence detection from clean and noisy speech
    Heckmann, Martin
    COMPUTER SPEECH AND LANGUAGE, 2018, 48 : 15 - 30
  • [43] Estimation of Ideal Binary Mask for Audio-Visual Monaural Speech Enhancement
    Balasubramanian, S.
    Rajavel, R.
    Kar, Asutosh
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (09) : 5313 - 5337
  • [44] Influence of native language phonetic system on audio-visual speech perception
    Wang, Yue
    Behne, Dawn M.
    Jiang, Haisheng
    JOURNAL OF PHONETICS, 2009, 37 (03) : 344 - 356
  • [45] Audio-Visual Fusion With Temporal Convolutional Attention Network for Speech Separation
    Liu, Debang
    Zhang, Tianqi
    Christensen, Mads Graesboll
    Yi, Chen
    An, Zeliang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4647 - 4660
  • [46] Speech Reaction Time Measurements for the Evaluation of Audio-Visual Spatial Coherence
    Stenzel, Hanne
    Jackson, Philip J. B.
    Francombe, Jon
    2017 NINTH INTERNATIONAL CONFERENCE ON QUALITY OF MULTIMEDIA EXPERIENCE (QOMEX), 2017,
  • [47] Lip Tracking Method for the System of Audio-Visual Polish Speech Recognition
    Kubanek, Mariusz
    Bobulski, Janusz
    Adrjanowicz, Lukasz
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2012, 7267 : 535 - 542
  • [48] Acoustic scene complexity affects motion behavior during speech perception in audio-visual multi-talker virtual environments
    Slomianka, Valeska
    Dau, Torsten
    Ahrens, Axel
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [49] The temporal dynamics of conscious and unconscious audio-visual semantic integration
    Gao, Mingjie
    Zhu, Weina
    Drewes, Jan
    HELIYON, 2024, 10 (13)
  • [50] Acoustic and visual phonetic features in the McGurk effect - an audiovisual speech illusion
    Tiippana, Kaisa
    Tiainen, Mikko
    Vainio, Lari
    Vainio, Martti
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1633 - 1637