MANDARIN AUDIO-VISUAL SPEECH RECOGNITION WITH EFFECTS TO THE NOISE AND EMOTION

被引:0
作者
Pao, Tsang-Long [1 ]
Liao, Wen-Yuan [2 ]
Chen, Yu-Te [1 ]
Wu, Tsan-Nung [1 ]
机构
[1] Tatung Univ, Dept Comp Sci & Engn, Taipei 104, Taiwan
[2] DeLin Inst Technol, Dept Comp Sci & Informat Engn, Tucheng City 236, Taipei County, Taiwan
来源
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL | 2010年 / 6卷 / 02期
关键词
Audio-visual recognition; Feature extraction; Gaussian mixture model; K-nearest neighbour; Hidden Markov model; Weighted-discrete KNN; HIDDEN MARKOV-MODELS; SPEAKER RECOGNITION; FEATURES; EXTRACTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents; a Mandarin audio-visual recognition system dealing with noisy and emotional speech signal. In the proposed approach, we extract the visual features of the lips. These features are very important to the recognition system. especially in noisy condition or with emotional effects. In this recognition system., we propose to use the weighted-discrete KNN as the classifier and compare the results with two popular classifiers, the GAM and HMM, and evaluate their performance by applying to a Mandarin audio-visual speech corpus. The experimental results of different classifiers at various SNR. levels are presented The results show that using the WD-KNN classifier yields better recognition accuracy than. other classifiers for the used Mandarin speech corpus.
引用
收藏
页码:711 / 723
页数:13
相关论文
共 33 条
  • [1] [Anonymous], 1999, 2 INT C AUD VID BAS
  • [2] BAHLER LG, 1994, P ACOUSTICS SPEECH S, V1, P321
  • [3] Coupled hidden Markov models for complex action recognition
    Brand, M
    Oliver, N
    Pentland, A
    [J]. 1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, : 994 - 999
  • [4] CARLO DD, 2000, INT J COMPUT VISION, V38, P99
  • [5] CHEN T, 1997, ICASSP, V1, P179
  • [6] Chen TH, 2001, IEEE SIGNAL PROC MAG, V18, P9
  • [7] A review of speech-based bimodal recognition
    Chibelushi, CC
    Deravi, F
    Mason, JSD
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2002, 4 (01) : 23 - 37
  • [8] DAVIS S, 1980, IEEE T ACOUST SPEECH, V4, P357
  • [9] Dudani S. A., 1976, IEEE Transactions on Systems, Man and Cybernetics, VSMC-6, P325, DOI 10.1109/TSMC.1976.5408784
  • [10] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151