Audio-visual continuous speech recognition using mpeg-4 compliant visual features

被引：0

作者：

Aleksic, PS ^{[1
]}

Williams, JJ ^{[1
]}

Wu, ZL ^{[1
]}

Katsaggelos, AK ^{[1
]}

机构：

[1] Northwestern Univ, Dept Elect & Comp Engn, Evanston, IL 60208 USA

来源：

2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS | 2002年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we utilize Facial Animation Parameters (FAPs), supported by the MPEG-4 standard for the visual representation of speech, in order to significantly improve automatic speech recognition (ASR). We describe a robust and automatic algorithm for extraction of FAPs from visual data that requires no hand labeling or extensive training procedures. Multi-stream Hidden Markov Models (HMM) were used to integrate audio and visual information. ASR experiments were performed under both clean and noisy audio conditions using relatively large vocabulary (approximately 1000 words). The proposed system reduces the word error rate (WER) by 20% to 23% relatively to audio-only ASR WERs, at various SNRs with additive white Gaussian noise, and by 19% relatively to audio-only ASR WER under clean audio conditions.

引用

页码：960 / 963

页数：4

共 14 条

[1] Abrantes G. A., 1997, FACE FACIAL ANIMATIO
[2] ALEKIC PS, 2002, IN PRESS EURASIP J A
[3] Bernstein L. E., 1986, J HOPKINS LIPREADING
[4] BREGLER C, 1994, INT CONF ACOUST SPEE, P669, DOI 10.1109/ICASSP.1994.389567
[5] Audio-Visual Speech Modeling for Continuous Speech Recognition
Dupont, Stephane
Luettin, Juergen
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
[6] GLOTIN H, 2001, IEEE P ICASSP, V1, P165
[7] *ISO IEC FDIS 1449, 1998, ISOIECJTC1SC29WG11N2
[8] SNAKES - ACTIVE CONTOUR MODELS
KASS, M
WITKIN, A
TERZOPOULOS, D
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 1987, 1 (04) : 321 - 331
[9] An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bits/frame
Lavagetto, F
Pockaj, R
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (10) : 1085 - 1097
[10] Speech recognition by machines and humans
Lippmann, RP
[J]. SPEECH COMMUNICATION, 1997, 22 (01) : 1 - 15

← 1 2 →