Robot Command Interface Using an Audio-Visual Speech Recognition System

被引:0
|
作者
Ceballos, Alexander [1 ,2 ]
Gomez, Juan [2 ]
Prieto, Flavio [3 ]
Redarce, Tanneguy [4 ]
机构
[1] Inst Tecnol Metropolitano, Medellin, Colombia
[2] Univ Nacional Colombia Sede Manizales, DIEEC, Manizales, Colombia
[3] Univ Nacional Colombia Sede Bogota, DIMM, Bogota, Colombia
[4] Inst Natl Sci Appliquees Lyo, Lyon, France
来源
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS | 2009年 / 5856卷
关键词
Speech recognition; MPEG-4; manipulator; LAPAROSCOPIC SURGERY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
引用
收藏
页码:869 / +
页数:3
相关论文
共 50 条
  • [41] Statistical multimodal integration for audio-visual speech processing
    Nakamura, S
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (04): : 854 - 866
  • [42] Emotional Audio-Visual Speech Synthesis Based on PAD
    Jia, Jia
    Zhang, Shen
    Meng, Fanbo
    Wang, Yongxin
    Cai, Lianhong
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 570 - 582
  • [43] Speech-Controlled Human-Computer Interface for Audio-Visual Breast Self-Examination Guidance System
    Billones, Robert Kerwin C.
    Daidos, Elmer P.
    Cabatuan, Melvin K.
    Gan Lim, Laurence
    Galvez, Reagan L.
    Santos Carandang, Jose
    Camilo Punzalan, Eric
    Luisa Enriquez, Ma.
    Erasga, Dennis
    Ples, Michael
    Teruel, Romeo
    Sison-Gareza, Balintawak
    Ellenita Carandang, Ma.
    Carandang, Julien
    2015 INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY,COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2015, : 535 - +
  • [44] Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis
    Chen, Hang
    Du, Jun
    Dai, Yusheng
    Lee, Chin-Hui
    Siniscalchi, Sabato Marco
    Watanabe, Shinji
    Scharenborg, Odette
    Chen, Jingdong
    Yin, Bao-Cai
    Pan, Jia
    INTERSPEECH 2022, 2022, : 1766 - 1770
  • [45] Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition
    Li, Guinan
    Deng, Jiajun
    Geng, Mengzhe
    Jin, Zengrui
    Wang, Tianzi
    Hu, Shujie
    Cui, Mingyu
    Meng, Helen
    Liu, Xunying
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2707 - 2723
  • [46] Multi-Stream Asynchrony Dynamic Bayesian Network model for audio-visual continuous speech recognition
    Lv, Guoyun
    Jiang, Dongmei
    Zhao, Rongchun
    Jiang, Xiaoyue
    Sahli, H.
    2007 14TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNALS, & IMAGE PROCESSING & EURASIP CONFERENCE FOCUSED ON SPEECH & IMAGE PROCESSING, MULTIMEDIA COMMUNICATIONS & SERVICES, 2007, : 170 - +
  • [47] A Robust Feature Extraction with Dual Fusion aided Extreme Learning for Audio-Visual Hindi Speech Recognition
    Sharma, Usha
    Om, Hari
    Mishra, A. N.
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2020, 79 (05): : 383 - 386
  • [48] AFT-SAM: Adaptive Fusion Transformer with a Sparse Attention Mechanism for Audio-Visual Speech Recognition
    Che, Na
    Zhu, Yiming
    Wang, Haiyan
    Zeng, Xianwei
    Du, Qinsheng
    APPLIED SCIENCES-BASEL, 2025, 15 (01):
  • [49] FEATURE SPACE VIDEO STREAM CONSISTENCY ESTIMATION FOR DYNAMIC STREAM WEIGHTING IN AUDIO-VISUAL SPEECH RECOGNITION
    Terry, Louis H.
    Shiell, Derek J.
    Katsaggelos, Aggelos K.
    2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, 2008, : 1316 - 1319
  • [50] Two-Layered Audio-Visual Integration in Voice Activity Detection and Automatic Speech Recognition for Robots
    Yoshida, Takami
    Nakadai, Kazuhiro
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2710 - 2713