Robot Command Interface Using an Audio-Visual Speech Recognition System

被引:0
作者
Ceballos, Alexander [1 ,2 ]
Gomez, Juan [2 ]
Prieto, Flavio [3 ]
Redarce, Tanneguy [4 ]
机构
[1] Inst Tecnol Metropolitano, Medellin, Colombia
[2] Univ Nacional Colombia Sede Manizales, DIEEC, Manizales, Colombia
[3] Univ Nacional Colombia Sede Bogota, DIMM, Bogota, Colombia
[4] Inst Natl Sci Appliquees Lyo, Lyon, France
来源
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS | 2009年 / 5856卷
关键词
Speech recognition; MPEG-4; manipulator; LAPAROSCOPIC SURGERY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
引用
收藏
页码:869 / +
页数:3
相关论文
共 21 条
[1]  
AGUILAR RC, 2007, REV CHILENA INGENIER, V15, P18
[2]  
ALEKSIC PS, 2005, IEEE INT C IM PROC I, V3, P501
[3]   Laparoscopic visual field - Voice vs foot pedal interfaces for control of the AESOP robot [J].
Allaf, ME ;
Jackman, SV ;
Schulam, PG ;
Cadeddu, JA ;
Lee, BR ;
Moore, RG ;
Kavoussi, LR .
SURGICAL ENDOSCOPY-ULTRASOUND AND INTERVENTIONAL TECHNIQUES, 1998, 12 (12) :1415-1418
[4]  
ALLEN TPK, 2008, STUDIES HLTH TECHNOL, P132
[5]   EVALUATION OF SPEECH RECOGNIZERS FOR SPEECH TRAINING APPLICATIONS [J].
ANDERSON, S ;
KEWLEYPORT, D .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (04) :229-241
[6]  
[Anonymous], 2002, EURASIP J ADV SIGNAL
[7]  
CAMPBELL R, 2006, AUDIO VISUAL SPEECH, P562
[8]   The processing of audio-visual speech: empirical and neural bases [J].
Campbell, Ruth .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2008, 363 (1493) :1001-1010
[9]  
Elliot R., 1995, HIDDEN MARKOV MODELS
[10]  
GOECKE R, 2005, 8 INT S SIGN PROC IT, V1, P70