Robot Command Interface Using an Audio-Visual Speech Recognition System

被引:0
|
作者
Ceballos, Alexander [1 ,2 ]
Gomez, Juan [2 ]
Prieto, Flavio [3 ]
Redarce, Tanneguy [4 ]
机构
[1] Inst Tecnol Metropolitano, Medellin, Colombia
[2] Univ Nacional Colombia Sede Manizales, DIEEC, Manizales, Colombia
[3] Univ Nacional Colombia Sede Bogota, DIMM, Bogota, Colombia
[4] Inst Natl Sci Appliquees Lyo, Lyon, France
来源
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS | 2009年 / 5856卷
关键词
Speech recognition; MPEG-4; manipulator; LAPAROSCOPIC SURGERY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
引用
收藏
页码:869 / +
页数:3
相关论文
共 50 条
  • [31] Face-to-talk: Audio-visual speech detection for robust speech recognition in noisy environment
    Murai, K
    Nakamura, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03): : 505 - 513
  • [32] THE NEW DELFT UNIVERSITY OF TECHNOLOGY DATA CORPUS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Chitu, Alin G.
    Rothkrantz, Leon J. M.
    EUROMEDIA'2009, 2009, : 63 - 69
  • [33] Comparison between different feature extraction techniques for audio-visual speech recognition
    Alin G. Chiţu
    Leon J. M. Rothkrantz
    Pascal Wiggers
    Jacek C. Wojdel
    Journal on Multimodal User Interfaces, 2007, 1 : 7 - 20
  • [34] Audio-Visual Speech Recognition Scheme Based on Wavelets and Random Forests Classification
    Daniel Terissi, Lucas
    Sad, Gonzalo D.
    Carlos Gomez, Juan
    Parodi, Marianela
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 567 - 574
  • [35] A robust visual feature extraction based BTSM-LDA for audio-visual speech recognition
    Lv, Guoyun
    Zhao, Rongchun
    Jiang, Dongmei
    Li, Yan
    Sahli, H.
    2007 SECOND INTERNATIONAL CONFERENCE IN COMMUNICATIONS AND NETWORKING IN CHINA, VOLS 1 AND 2, 2007, : 1044 - +
  • [36] Cross-Domain Deep Visual Feature Generation for Mandarin Audio-Visual Speech Recognition
    Su, Rongfeng
    Liu, Xunying
    Wang, Lan
    Yang, Jingzhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 185 - 197
  • [37] Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition
    Shivappa, Shankar T.
    Rao, Bhaskar D.
    Trivedi, Mohan M.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2241 - 2244
  • [38] Taris: An online speech recognition framework with sequence to sequence neural networks for both audio-only and audio-visual speech
    Sterpu, George
    Harte, Naomi
    COMPUTER SPEECH AND LANGUAGE, 2022, 74
  • [39] MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
    Anwar, Mohamed
    Shi, Bowen
    Goswami, Vedanuj
    Hsu, Wei-Ning
    Pino, Juan
    Wang, Changhan
    INTERSPEECH 2023, 2023, : 4064 - 4068
  • [40] SPEAKER-TARGETED AUDIO-VISUAL SPEECH RECOGNITION USING A HYBRID CTC/ATTENTION MODEL WITH INTERFERENCE LOSS
    Tsunoda, Ryota
    Aihara, Ryo
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Imai, Yoshie
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 251 - 255