A LIP GEOMETRY APPROACH FOR FEATURE-FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION

被引:0
|
作者
Ibrahim, M. Z. [1 ]
Mulvaney, D. J. [1 ]
机构
[1] Univ Loughborough, Sch Elect Elect & Syst Engn, Loughborough LE11 3TU, Leics, England
来源
2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP) | 2014年
关键词
Lip geometry; feature fusion; audio-visual speech recognition; OpenCV; INTEGRATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. By defining a small number of highly descriptive geometrical features relevant to the recognition task, the approach avoids the poor scalability (termed the 'curse of dimensionality') that is often associated with feature-fusion AVSR methods. The paper describes comparisons of the new approach with conventional appearance-based methods, namely the discrete cosine transform and the principal component analysis techniques, when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the geometrical method significantly improves speech recognition accuracy compared with appearance-based approaches, despite the new method requiring significantly fewer features.
引用
收藏
页码:644 / 647
页数:4
相关论文
共 50 条
  • [1] Lip landmark-based audio-visual speech enhancement with multimodal feature fusion network
    Li, Yangke
    Zhang, Xinman
    NEUROCOMPUTING, 2023, 549
  • [2] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
    Wei, Jie
    Hu, Guanyu
    Yang, Xinyu
    Luu, Anh Tuan
    Dong, Yizhuo
    INTERSPEECH 2022, 2022, : 1988 - 1992
  • [3] Bimodal fusion in audio-visual speech recognition
    Zhang, XZ
    Mersereau, RM
    Clements, M
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 964 - 967
  • [4] Robust Audio-Visual Speech Recognition Based on Hybrid Fusion
    Liu, Hong
    Li, Wenhao
    Yang, Bing
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7580 - 7586
  • [5] Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion
    Liu, Hong
    Chen, Zhan
    Yang, Bing
    INTERSPEECH 2020, 2020, : 3520 - 3524
  • [6] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [7] Lip movement synthesis in audio-visual speech recognition system
    Li, JQ
    Yin, YX
    PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 461 - 465
  • [8] Relevant feature selection for audio-visual speech recognition
    Drugman, Thomas
    Gurban, Mihai
    Thiran, Jean-Philippe
    2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 179 - +
  • [9] Lip movement synthesis in audio-visual speech recognition system
    Li, Junquan
    Yin, Yixin
    Proc. 2005 IEEE Int. Conf. on Lang. Process. Knowl. Engin. IEEE NLP-KE '05, (461-465):
  • [10] Analysis of lip geometric features for audio-visual speech recognition
    Kaynak, MN
    Zhi, Q
    Cheok, AD
    Sengupta, K
    Han, Z
    Chung, KC
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2004, 34 (04): : 564 - 570