From a Wizard of Oz experiment to a real time speech and gesture multimodal interface

被引:15
作者
Carbini, S. [1 ]
Delphin-Poulat, L. [1 ]
Perron, L. [1 ]
Viallet, J. E. [1 ]
机构
[1] France Telecom, R&D, F-22307 Lannion, France
关键词
interpersonal communication; mediated collaboration; non-verbal behaviour; non-intrusive multimodal interface; context dependent interpretation; bimanual gesture; speech recognition; head; hands detection; tracking;
D O I
10.1016/j.sigpro.2006.04.001
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a Wizard of Oz cooperative story telling experiment named Virstory, where user speech-gesture actions are interpreted in order to cooperatively build a story with another person, partner of the interpreter. The gesture, speech and multimodal behaviours of 20 subjects are detailed. The multimodal oral with gesture large display interface (MOWGLI) is then described. It is an oral and gesture multimodal human-computer interface, allowing users interacting remotely in real time. Continuous pointing direction and other hand discrete selection gestures are recognized by computer vision tracking of user's head and hands. Associating gesture recognition with speech recognition of selection and deselection oral commands, MOWGLI behaves as a virtual contactless, application independent, multimodal mouse. Discrete pointing locations corresponding to discrete speech or gesture selection time events are extracted from the continuous pointing process. A large vocabulary related to a chess game application allows shorter and specific multimodal commands such as pointing at desired location (there) and uttering a piece move oral command without needing a previous pointing gesture to another piece location, whereas generic "Put that there" commands need two successive pointing locations ((that) and (there)). Contextual constraints such as displacement rules of pieces and current game position allow interpretation of ambiguous commands and lead to shorter multimodal commands. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:3559 / 3577
页数:19
相关论文
共 30 条
[1]  
ATIENZA R, 2003, INT C MULT INT ICMI, P188
[2]  
BARTKOVA K, 1991, INT C PHON SCI ICPHS, P474
[3]  
BELLIK Y, 2001, INT WORKSH INF PRES
[4]  
BOLT RA, 1980, P 7 ANN C COMP GRAPH, P262, DOI [10.1145/965105.807503, DOI 10.1145/965105.807503]
[5]  
BUISINE S, 2005, INT C HUM COMP INT L
[6]  
Carbajales S, 2005, ISWS '05: Proceedings of the 2005 International Symposium on Web Services and Applications, P16
[7]  
CARBINI S, 2004, POINT 04 INT C PATT
[8]  
CHEN F, 2005, INT C MULT INT ICMI, P274
[9]  
CORRADINI A, 2002, INT CLASS WORKSH NAT, P52
[10]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893