Human Perception of Audio-Visual Synthetic Character Emotion Expression in the Presence of Ambiguous and Conflicting Information

被引:40
作者
Mower, Emily [1 ]
Mataric, Maja J. [2 ]
Narayanan, Shrikanth [1 ,2 ]
机构
[1] Univ So Calif, Dept Elect Engn, Los Angeles, CA 90089 USA
[2] Univ So Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
基金
美国国家科学基金会;
关键词
Expressive animation; multimodality; synthesis; RECOGNITION; INTEGRATION; EAR;
D O I
10.1109/TMM.2009.2021722
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Computer simulated avatars and humanoid robots have an increasingly prominent place in today's world. Acceptance of these synthetic characters depends on their ability to properly and recognizably convey basic emotion states to a user population. This study presents an analysis of the interaction between emotional audio (human voice) and video (simple animation) cues. The emotional relevance of the channels is analyzed with respect to their effect on human perception and through the study of the extracted audio-visual features that contribute most prominently to human perception. As a result of the unequal level of expressivity across the two channels, the audio was shown to bias the perception of the evaluators. However, even in the presence of a strong audio bias, the video data were shown to affect human perception. The feature sets extracted from emotionally matched audio-visual displays contained both audio and video features while feature sets resulting from emotionally mismatched audio-visual displays contained only audio information. This result indicates that observers integrate natural audio cues and synthetic video cues only when the information expressed is in congruence. It is therefore important to properly design the presentation of audio-visual cues as incorrect design may cause observers to ignore the information conveyed in one of the channels.
引用
收藏
页码:843 / 855
页数:13
相关论文
共 41 条
  • [21] GRIMM M, 2005, P C TEL MULT COMM KA
  • [22] Primitives-based evaluation and estimation of emotions in speech
    Grimm, Michael
    Kroschel, Kristian
    Mower, Emily
    Narayanan, Shrikanth
    [J]. SPEECH COMMUNICATION, 2007, 49 (10-11) : 787 - 800
  • [23] Extracting moods from pictures and sounds
    Hanjalic, A
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2006, 23 (02) : 90 - 100
  • [24] Evidence for the integration of audiovisual emotional information at the perceptual level of processing
    Hietanen, J
    Leppänen, JM
    Illi, M
    Surakka, V
    [J]. EUROPEAN JOURNAL OF COGNITIVE PSYCHOLOGY, 2004, 16 (06): : 769 - 790
  • [25] Toward detecting emotions in spoken dialogs
    Lee, CM
    Narayanan, SS
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (02): : 293 - 303
  • [26] Lin YL, 2005, PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, P4898
  • [27] Fuzzy logical model of bimodal emotion perception: Comment on "The perception of emotions by ear and by eye" by de Gelder and Vroomen
    Massaro, DW
    Cohen, MM
    [J]. COGNITION & EMOTION, 2000, 14 (03) : 313 - 320
  • [28] HEARING LIPS AND SEEING VOICES
    MCGURK, H
    MACDONALD, J
    [J]. NATURE, 1976, 264 (5588) : 746 - 748
  • [29] Rapid perceptual integration of facial expression and emotional body language
    Meeren, HKM
    van Heijnsbergen, CCRJ
    de Gelder, B
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (45) : 16518 - 16523
  • [30] The Kuleshov Effect: the influence of contextual framing on emotional attributions
    Mobbs, Dean
    Weiskopf, Nikolaus
    Lau, Hakwan C.
    Featherstone, Eric
    Dolan, Ray J.
    Frith, Chris D.
    [J]. SOCIAL COGNITIVE AND AFFECTIVE NEUROSCIENCE, 2006, 1 (02) : 95 - 106