Real-time speech synthesis system driven by visual speech

被引：0

作者：

Li, G ^{[1
]}

Xie, GM ^{[1
]}

Lin, L ^{[1
]}

机构：

[1] Tianjin Univ, State Key Lab Precis Measurement Technol & Instru, Tianjin 300072, Peoples R China

来源：

PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION SCIENCE AND TECHNOLOGY, VOL 2 | 2004年

关键词：

speechreading; speech recognition; active contour model; hidden Markov model; speech synthesis;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Visual information from the speaker's mouth region provides speech information and benefits both human speech perception and automatic speech recognition. In this paper, we investigate using visual information only to recognize speech and propose a real-time speech synthesis system driven by visual speech. The system consists of three components: a visual front end, a lipreading recognizer and a speech synthesis module. The visual front end firstly enhances color lip-movement images by using a lip chromatic filter and then extracts relevant speech features based on a novel active contour model (Snake). An adaptive force is introduced at a point of the Snake, so control points of snake do not depend on the position of original contour and the Snake converges on the real target quickly. The recognizer based on continuous hidden Markov models (HMMs) is used to train and recognize a sequence of the combined visual features. The speech synthesis module synthesizes acoustic speech based on the recognition results. The experimental results show that our system was able to achieve 71.7% recognition accuracy for 10 isolated words of a single speaker. The system can be applied to rehabilitation of the speech handicapped.

引用

页码：397 / 402

页数：6

共 7 条

[1] Audio-Visual Speech Modeling for Continuous Speech Recognition
Dupont, Stephane
Luettin, Juergen
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
[2] SNAKES - ACTIVE CONTOUR MODELS
KASS, M
WITKIN, A
TERZOPOULOS, D
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 1987, 1 (04) : 321 - 331
[3] Recent advances in the automatic recognition of audiovisual speech
Potamianos, G
Neti, C
Gravier, G
Garg, A
Senior, AW
[J]. PROCEEDINGS OF THE IEEE, 2003, 91 (09) : 1306 - 1326
[4] Potamianos G, 2001, P INT C AUD VIS SPEE, P177
[5] A TUTORIAL ON HIDDEN MARKOV-MODELS AND SELECTED APPLICATIONS IN SPEECH RECOGNITION
RABINER, LR
[J]. PROCEEDINGS OF THE IEEE, 1989, 77 (02) : 257 - 286
[6] XUN YH, 2002, ACTA ELECT SINICA, V30, P153
[7] XUN YH, 2001, ACTA ELECT SINICA, V29, P239

← 1 →