Integrating mechanisms of visual guidance in naturalistic language production

被引:0
作者
Moreno I. Coco
Frank Keller
机构
[1] School of Informatics,Institute for Language, Cognition and Computation
[2] University of Edinburgh,Faculdade de Psicologia
[3] Universidade de Lisboa,undefined
来源
Cognitive Processing | 2015年 / 16卷
关键词
Eye movements; Language production; Scene understanding; Cross-modal processing; Eye–voice span; Structural guidance;
D O I
暂无
中图分类号
学科分类号
摘要
Situated language production requires the integration of visual attention and linguistic processing. Previous work has not conclusively disentangled the role of perceptual scene information and structural sentence information in guiding visual attention. In this paper, we present an eye-tracking study that demonstrates that three types of guidance, perceptual, conceptual, and structural, interact to control visual attention. In a cued language production experiment, we manipulate perceptual (scene clutter) and conceptual guidance (cue animacy) and measure structural guidance (syntactic complexity of the utterance). Analysis of the time course of language production, before and during speech, reveals that all three forms of guidance affect the complexity of visual responses, quantified in terms of the entropy of attentional landscapes and the turbulence of scan patterns, especially during speech. We find that perceptual and conceptual guidance mediate the distribution of attention in the scene, whereas structural guidance closely relates to scan pattern complexity. Furthermore, the eye–voice span of the cued object and its perceptual competitor are similar; its latency mediated by both perceptual and structural guidance. These results rule out a strict interpretation of structural guidance as the single dominant form of visual guidance in situated language production. Rather, the phase of the task and the associated demands of cross-modal cognitive processing determine the mechanisms that guide attention.
引用
收藏
页码:131 / 150
页数:19
相关论文
共 138 条
  • [41] Fletcher-Watson S(1996)Disambiguating complex visual information: toward communication of personal views of a scene Perception 25 931-948
  • [42] Findlay J(2000)Discourse constraints on syntactic processing in language production: a cross-linguistic study in English and Spanish J Mem Lang 42 168-182
  • [43] Leekam S(2008)On-line contextual influences during reading normal text: a multiple-regression analysis Vis Res 48 2172-2183
  • [44] Benson V(2010)User language behavior, domain knowledge, and conversation context in automatic word acquisition for situated dialogue J Artif Intell Res 37 247-277
  • [45] Frank M(1998)Eye movements in reading and information processing: 20 years of research Psychol Bull 124 372-422
  • [46] Vul E(2007)Measuring visual clutter J Vis 7 1-22
  • [47] Johnson S(2007)Task and context determine where you look J Vis 7 1-20
  • [48] Fukumura K(2008)Labelme: a database and web-based tool for image annotation Int J Comput Vis 77 151-173
  • [49] Van Gompel R(1995)Integration of visual and linguistic information in spoken language comprehension Science 268 632-634
  • [50] Fukumura K(2006)Visual saliency and semantic incongruency influence eye movements when inspecting pictures Q J Exp Psychol 59 2031-2038