Robot Command Interface Using an Audio-Visual Speech Recognition System

被引：0

作者：

Ceballos, Alexander ^{[1
,2
]}

Gomez, Juan ^{[2
]}

Prieto, Flavio ^{[3
]}

Redarce, Tanneguy ^{[4
]}

机构：

[1] Inst Tecnol Metropolitano, Medellin, Colombia

[2] Univ Nacional Colombia Sede Manizales, DIEEC, Manizales, Colombia

[3] Univ Nacional Colombia Sede Bogota, DIMM, Bogota, Colombia

[4] Inst Natl Sci Appliquees Lyo, Lyon, France

来源：

PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS | 2009年 / 5856卷

关键词：

Speech recognition; MPEG-4; manipulator; LAPAROSCOPIC SURGERY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

引用

页码：869 / +

页数：3

共 50 条

[41] Statistical multimodal integration for audio-visual speech processing
Nakamura, S
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (04): : 854 - 866
[42] Emotional Audio-Visual Speech Synthesis Based on PAD
Jia, Jia
Zhang, Shen
Meng, Fanbo
Wang, Yongxin
Cai, Lianhong
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 570 - 582
[43] Speech-Controlled Human-Computer Interface for Audio-Visual Breast Self-Examination Guidance System
Billones, Robert Kerwin C.
Daidos, Elmer P.
Cabatuan, Melvin K.
Gan Lim, Laurence
Galvez, Reagan L.
Santos Carandang, Jose
Camilo Punzalan, Eric
Luisa Enriquez, Ma.
Erasga, Dennis
Ples, Michael
Teruel, Romeo
Sison-Gareza, Balintawak
Ellenita Carandang, Ma.
Carandang, Julien
2015 INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY,COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2015, : 535 - +
[44] Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis
Chen, Hang
Du, Jun
Dai, Yusheng
Lee, Chin-Hui
Siniscalchi, Sabato Marco
Watanabe, Shinji
Scharenborg, Odette
Chen, Jingdong
Yin, Bao-Cai
Pan, Jia
INTERSPEECH 2022, 2022, : 1766 - 1770
[45] Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition
Li, Guinan
Deng, Jiajun
Geng, Mengzhe
Jin, Zengrui
Wang, Tianzi
Hu, Shujie
Cui, Mingyu
Meng, Helen
Liu, Xunying
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2707 - 2723
[46] Multi-Stream Asynchrony Dynamic Bayesian Network model for audio-visual continuous speech recognition
Lv, Guoyun
Jiang, Dongmei
Zhao, Rongchun
Jiang, Xiaoyue
Sahli, H.
2007 14TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNALS, & IMAGE PROCESSING & EURASIP CONFERENCE FOCUSED ON SPEECH & IMAGE PROCESSING, MULTIMEDIA COMMUNICATIONS & SERVICES, 2007, : 170 - +
[47] A Robust Feature Extraction with Dual Fusion aided Extreme Learning for Audio-Visual Hindi Speech Recognition
Sharma, Usha
Om, Hari
Mishra, A. N.
JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2020, 79 (05): : 383 - 386
[48] AFT-SAM: Adaptive Fusion Transformer with a Sparse Attention Mechanism for Audio-Visual Speech Recognition
Che, Na
Zhu, Yiming
Wang, Haiyan
Zeng, Xianwei
Du, Qinsheng
APPLIED SCIENCES-BASEL, 2025, 15 (01):
[49] FEATURE SPACE VIDEO STREAM CONSISTENCY ESTIMATION FOR DYNAMIC STREAM WEIGHTING IN AUDIO-VISUAL SPEECH RECOGNITION
Terry, Louis H.
Shiell, Derek J.
Katsaggelos, Aggelos K.
2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, 2008, : 1316 - 1319
[50] Two-Layered Audio-Visual Integration in Voice Activity Detection and Automatic Speech Recognition for Robots
Yoshida, Takami
Nakadai, Kazuhiro
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2710 - 2713

← 1 2 3 4 5 →