Human Perception of Audio-Visual Synthetic Character Emotion Expression in the Presence of Ambiguous and Conflicting Information

被引：40

作者：

Mower, Emily ^{[1
]}

Mataric, Maja J. ^{[2
]}

Narayanan, Shrikanth ^{[1
,2
]}

机构：

[1] Univ So Calif, Dept Elect Engn, Los Angeles, CA 90089 USA

[2] Univ So Calif, Dept Comp Sci, Los Angeles, CA 90089 USA

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2009年 / 11卷 / 05期

基金：

美国国家科学基金会;

关键词：

Expressive animation; multimodality; synthesis; RECOGNITION; INTEGRATION; EAR;

D O I：

10.1109/TMM.2009.2021722

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Computer simulated avatars and humanoid robots have an increasingly prominent place in today's world. Acceptance of these synthetic characters depends on their ability to properly and recognizably convey basic emotion states to a user population. This study presents an analysis of the interaction between emotional audio (human voice) and video (simple animation) cues. The emotional relevance of the channels is analyzed with respect to their effect on human perception and through the study of the extracted audio-visual features that contribute most prominently to human perception. As a result of the unequal level of expressivity across the two channels, the audio was shown to bias the perception of the evaluators. However, even in the presence of a strong audio bias, the video data were shown to affect human perception. The feature sets extracted from emotionally matched audio-visual displays contained both audio and video features while feature sets resulting from emotionally mismatched audio-visual displays contained only audio information. This result indicates that observers integrate natural audio cues and synthetic video cues only when the information expressed is in congruence. It is therefore important to properly design the presentation of audio-visual cues as incorrect design may cause observers to ignore the information conveyed in one of the channels.

引用

页码：843 / 855

页数：13

共 41 条

[21] GRIMM M, 2005, P C TEL MULT COMM KA
[22] Primitives-based evaluation and estimation of emotions in speech
Grimm, Michael
Kroschel, Kristian
Mower, Emily
Narayanan, Shrikanth
[J]. SPEECH COMMUNICATION, 2007, 49 (10-11) : 787 - 800
[23] Extracting moods from pictures and sounds
Hanjalic, A
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2006, 23 (02) : 90 - 100
[24] Evidence for the integration of audiovisual emotional information at the perceptual level of processing
Hietanen, J
Leppänen, JM
Illi, M
Surakka, V
[J]. EUROPEAN JOURNAL OF COGNITIVE PSYCHOLOGY, 2004, 16 (06): : 769 - 790
[25] Toward detecting emotions in spoken dialogs
Lee, CM
Narayanan, SS
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (02): : 293 - 303
[26] Lin YL, 2005, PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, P4898
[27] Fuzzy logical model of bimodal emotion perception: Comment on "The perception of emotions by ear and by eye" by de Gelder and Vroomen
Massaro, DW
Cohen, MM
[J]. COGNITION & EMOTION, 2000, 14 (03) : 313 - 320
[28] HEARING LIPS AND SEEING VOICES
MCGURK, H
MACDONALD, J
[J]. NATURE, 1976, 264 (5588) : 746 - 748
[29] Rapid perceptual integration of facial expression and emotional body language
Meeren, HKM
van Heijnsbergen, CCRJ
de Gelder, B
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (45) : 16518 - 16523
[30] The Kuleshov Effect: the influence of contextual framing on emotional attributions
Mobbs, Dean
Weiskopf, Nikolaus
Lau, Hakwan C.
Featherstone, Eric
Dolan, Ray J.
Frith, Chris D.
[J]. SOCIAL COGNITIVE AND AFFECTIVE NEUROSCIENCE, 2006, 1 (02) : 95 - 106

← 1 2 3 4 5 →