Recognition of visual speech elements using adaptively boosted hidden Markov models

被引：29

作者：

Foo, SW ^{[1
]}

Lian, Y

Dong, L

机构：

[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119260, Singapore

[3] Natl Univ Singapore, Dept Elect & Comp Engn, Digital Syst & Applicat Lab, Singapore 117548, Singapore

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2004年 / 14卷 / 05期

关键词：

adaptive boosting (AdaBoost); automatic lip reading; hidden Markov model (HMM); visual speech processing;

D O I：

10.1109/TCSVT.2004.826773

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The performance of automatic speech recognition (ASR) system can be significantly enhanced with additional information from visual speech elements such as the movement of lips, tongue, and teeth, especially under noisy environment. In this paper, a novel approach for recognition of visual speech elements is presented. The approach makes use of adaptive boosting (AdaBoost) and hidden Markov models (HMMs) to build an AdaBoost-HMM classifier. The composite HMMs of the AdaBoost-HMM classifier are trained to cover different groups of training samples using the AdaBoost technique and the biased Baum-Welch training method. By combining the decisions of the component classifiers of the composite HMMs according to a novel probability synthesis rule, a more complex decision boundary is formulated than using the single HMM classifier. The method is applied to the recognition of the basic visual speech elements. Experimental results show that the AdaBoost-HMM classifier outperforms the traditional HMM classifier in accuracy, especially for visemes extracted from contexts.

引用

页码：693 / 705

页数：13

共 57 条

[1]

ADJOUDANI A, 1996, NATO ASI SER, P461

[2]

[Anonymous], 1993, Signal Processing Series

[3] Selective training for hidden Markov models with applications to speech classification [J].

Arslan, LM ;

Hansen, JHL .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (01) :46-54

[4]

BAHL L, 1986, P INT C AC SPEECH SI, V1, P49, DOI DOI 10.1109/ICASSP.1986.1169179>

[5] DRAGON SYSTEM - OVERVIEW [J].

BAKER, JK .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :24-29

[6] AUDITORY AND VISUAL CONTRIBUTIONS TO PERCEPTION OF CONSONANTS [J].

BINNIE, CA ;

MONTGOMERY, AA ;

JACKSON, PL .

JOURNAL OF SPEECH AND HEARING RESEARCH, 1974, 17 (04) :619-630

[7]

BREGLER C, 1995, FIFTH INTERNATIONAL CONFERENCE ON COMPUTER VISION, PROCEEDINGS, P494, DOI 10.1109/ICCV.1995.466899

[8]

CAMPBELL R, 1996, NATO ASI SER, P115

[9] Audio-visual integration in multimodal communication [J].

Chen, T ;

Rao, RR .

PROCEEDINGS OF THE IEEE, 1998, 86 (05) :837-852

[10]

Chen TH, 2001, IEEE SIGNAL PROC MAG, V18, P9

← 1 2 3 4 5 6 →