Recognition of para-linguistic information and its application to spoken dialogue system

被引:8
作者
Fujie, S [1 ]
Ejiri, Y [1 ]
Matsusaka, Y [1 ]
Kikuchi, H [1 ]
Kobayashi, T [1 ]
机构
[1] Waseda Univ, Sch Sci & Engn, Tokyo, Japan
来源
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03 | 2003年
关键词
D O I
10.1109/ASRU.2003.1318446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The human-human interactions in a spoken dialogue seem to use not only linguistic information in the utterances but also some sorts of additional information supporting linguistic information. We call these sorts of additional information "para-linguistic information". In this paper, we present a recognition method of attitudes by prosodic information, and a recognition method of head gestures. In the former method, in order to recognize two attitudes, such as "positive" and "negative", F0 pattern and phoneme alignment are introduced as features. In the latter method, in order to recognize three gestures, such as "nod", "tilt" and "shake", left-to-right HMM is introduced as the probabilistic model as well as optical flow is introduced as features. Experiment results show that these methods are sufficient to recognize user's attitude as para-linguistic information. Finally, we show a proto-type spoken dialogue system using para-linguistic information and how these sorts of information contribute the efficient conversation.
引用
收藏
页码:231 / 236
页数:6
相关论文
共 10 条
[1]  
COHEN J, 1960, EDUC PSYCHOL MEAS, V20, P1
[2]  
Hastie HW, 2002, SPEECH COMMUN, V36, P63, DOI 10.1016/S0167-6393(01)00026-7
[3]  
HAYAMIZU S, 1996, P 4 INT C SPOK LANG, V4, P2171
[4]  
KAPOOR A, 2002, 544 MIT MED LAB AFF
[5]  
Kawato S., 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580), P40, DOI 10.1109/AFGR.2000.840610
[6]  
KOBAYASHI H, 1997, P 1997 IEEE INT C SY, V4, P3732
[7]  
LIESKE C, 1997, P EUROSPEECH 97 RHOD, V3, P1431
[8]  
MATSUSAKA Y, 2001, P IEEE RAS INT C HUM, P271
[9]   On the use of prosody in automatic dialogue understanding [J].
Nöth, E ;
Batliner, A ;
Warnke, V ;
Haas, J ;
Boros, M ;
Buckow, J ;
Huber, R ;
Gallwitz, F ;
Nutt, M ;
Niemann, H .
SPEECH COMMUNICATION, 2002, 36 (1-2) :45-62
[10]  
Tojo T., 2000, P 2000 IEEE INT C SY, V2, P858