Robot Command Interface Using an Audio-Visual Speech Recognition System

被引：0

作者：

Ceballos, Alexander ^{[1
,2
]}

Gomez, Juan ^{[2
]}

Prieto, Flavio ^{[3
]}

Redarce, Tanneguy ^{[4
]}

机构：

[1] Inst Tecnol Metropolitano, Medellin, Colombia

[2] Univ Nacional Colombia Sede Manizales, DIEEC, Manizales, Colombia

[3] Univ Nacional Colombia Sede Bogota, DIMM, Bogota, Colombia

[4] Inst Natl Sci Appliquees Lyo, Lyon, France

来源：

PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS | 2009年 / 5856卷

关键词：

Speech recognition; MPEG-4; manipulator; LAPAROSCOPIC SURGERY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

引用

页码：869 / +

页数：3

共 50 条

[31] Face-to-talk: Audio-visual speech detection for robust speech recognition in noisy environment
Murai, K
Nakamura, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03): : 505 - 513
[32] THE NEW DELFT UNIVERSITY OF TECHNOLOGY DATA CORPUS FOR AUDIO-VISUAL SPEECH RECOGNITION
Chitu, Alin G.
Rothkrantz, Leon J. M.
EUROMEDIA'2009, 2009, : 63 - 69
[33] Comparison between different feature extraction techniques for audio-visual speech recognition
Alin G. Chiţu
Leon J. M. Rothkrantz
Pascal Wiggers
Jacek C. Wojdel
Journal on Multimodal User Interfaces, 2007, 1 : 7 - 20
[34] Audio-Visual Speech Recognition Scheme Based on Wavelets and Random Forests Classification
Daniel Terissi, Lucas
Sad, Gonzalo D.
Carlos Gomez, Juan
Parodi, Marianela
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 567 - 574
[35] A robust visual feature extraction based BTSM-LDA for audio-visual speech recognition
Lv, Guoyun
Zhao, Rongchun
Jiang, Dongmei
Li, Yan
Sahli, H.
2007 SECOND INTERNATIONAL CONFERENCE IN COMMUNICATIONS AND NETWORKING IN CHINA, VOLS 1 AND 2, 2007, : 1044 - +
[36] Cross-Domain Deep Visual Feature Generation for Mandarin Audio-Visual Speech Recognition
Su, Rongfeng
Liu, Xunying
Wang, Lan
Yang, Jingzhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 185 - 197
[37] Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition
Shivappa, Shankar T.
Rao, Bhaskar D.
Trivedi, Mohan M.
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2241 - 2244
[38] Taris: An online speech recognition framework with sequence to sequence neural networks for both audio-only and audio-visual speech
Sterpu, George
Harte, Naomi
COMPUTER SPEECH AND LANGUAGE, 2022, 74
[39] MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Anwar, Mohamed
Shi, Bowen
Goswami, Vedanuj
Hsu, Wei-Ning
Pino, Juan
Wang, Changhan
INTERSPEECH 2023, 2023, : 4064 - 4068
[40] SPEAKER-TARGETED AUDIO-VISUAL SPEECH RECOGNITION USING A HYBRID CTC/ATTENTION MODEL WITH INTERFERENCE LOSS
Tsunoda, Ryota
Aihara, Ryo
Takashima, Ryoichi
Takiguchi, Tetsuya
Imai, Yoshie
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 251 - 255

← 1 2 3 4 5 →