Robot Command Interface Using an Audio-Visual Speech Recognition System

被引:0
|
作者
Ceballos, Alexander [1 ,2 ]
Gomez, Juan [2 ]
Prieto, Flavio [3 ]
Redarce, Tanneguy [4 ]
机构
[1] Inst Tecnol Metropolitano, Medellin, Colombia
[2] Univ Nacional Colombia Sede Manizales, DIEEC, Manizales, Colombia
[3] Univ Nacional Colombia Sede Bogota, DIMM, Bogota, Colombia
[4] Inst Natl Sci Appliquees Lyo, Lyon, France
来源
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS | 2009年 / 5856卷
关键词
Speech recognition; MPEG-4; manipulator; LAPAROSCOPIC SURGERY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
引用
收藏
页码:869 / +
页数:3
相关论文
共 50 条
  • [11] Streaming Audio-Visual Speech Recognition with Alignment Regularization
    Ma, Pingchuan
    Moritz, Niko
    Petridis, Stavros
    Fuegen, Christian
    Pantic, Maja
    INTERSPEECH 2023, 2023, : 1598 - 1602
  • [12] Noisy Speech Recognition Based on Combined Audio-Visual Classifiers
    Terissi, Lucas D.
    Sad, Gonzalo D.
    Gomez, Juan C.
    Parodi, Marianela
    MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, 2015, 8869 : 43 - 53
  • [13] Multiple cameras for audio-visual speech recognition in an automotive environment
    Navarathna, Rajitha
    Dean, David
    Sridharan, Sridha
    Lucey, Patrick
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (04) : 911 - 927
  • [14] Multimodal Sparse Transformer Network for Audio-Visual Speech Recognition
    Song, Qiya
    Sun, Bin
    Li, Shutao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10028 - 10038
  • [15] Multi-pose lipreading and audio-visual speech recognition
    Virginia Estellers
    Jean-Philippe Thiran
    EURASIP Journal on Advances in Signal Processing, 2012
  • [16] Speech enhancement and recognition in meetings with an audio-visual sensor array
    Maganti, Hari Krishna
    Gatica-Perez, Daniel
    McCowan, Iain
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2257 - 2269
  • [17] Transfer Learning from Audio-Visual Grounding to Speech Recognition
    Hsu, Wei-Ning
    Harwath, David
    Glass, James
    INTERSPEECH 2019, 2019, : 3242 - 3246
  • [18] Multiple camera in car audio-visual speech recognition using phonetic and visemic information
    Biswas, Astik
    Sahu, P. K.
    Chandra, Mahesh
    COMPUTERS & ELECTRICAL ENGINEERING, 2015, 47 : 35 - 50
  • [19] Audio-Visual Speech Recognition Using A Two-Step Feature Fusion Strategy
    Liu, Hong
    Xu, Wanlu
    Yang, Bing
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1896 - 1903
  • [20] Audio-visual Integration for Robust Speech Recognition Using Maximum Weighted Stream Posteriors
    Seymour, Rowan
    Stewart, Darryl
    Ming, Ji
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 869 - 872