A LIP GEOMETRY APPROACH FOR FEATURE-FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION

被引：0

作者：

Ibrahim, M. Z. ^{[1
]}

Mulvaney, D. J. ^{[1
]}

机构：

[1] Univ Loughborough, Sch Elect Elect & Syst Engn, Loughborough LE11 3TU, Leics, England

来源：

2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP) | 2014年

关键词：

Lip geometry; feature fusion; audio-visual speech recognition; OpenCV; INTEGRATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. By defining a small number of highly descriptive geometrical features relevant to the recognition task, the approach avoids the poor scalability (termed the 'curse of dimensionality') that is often associated with feature-fusion AVSR methods. The paper describes comparisons of the new approach with conventional appearance-based methods, namely the discrete cosine transform and the principal component analysis techniques, when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the geometrical method significantly improves speech recognition accuracy compared with appearance-based approaches, despite the new method requiring significantly fewer features.

引用

页码：644 / 647

页数：4

共 50 条

[1] Lip landmark-based audio-visual speech enhancement with multimodal feature fusion network
Li, Yangke
Zhang, Xinman
NEUROCOMPUTING, 2023, 549
[2] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
Wei, Jie
Hu, Guanyu
Yang, Xinyu
Luu, Anh Tuan
Dong, Yizhuo
INTERSPEECH 2022, 2022, : 1988 - 1992
[3] Bimodal fusion in audio-visual speech recognition
Zhang, XZ
Mersereau, RM
Clements, M
2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 964 - 967
[4] Robust Audio-Visual Speech Recognition Based on Hybrid Fusion
Liu, Hong
Li, Wenhao
Yang, Bing
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7580 - 7586
[5] Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion
Liu, Hong
Chen, Zhan
Yang, Bing
INTERSPEECH 2020, 2020, : 3520 - 3524
[6] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
APPLIED ACOUSTICS, 2023, 211
[7] Lip movement synthesis in audio-visual speech recognition system
Li, JQ
Yin, YX
PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 461 - 465
[8] Relevant feature selection for audio-visual speech recognition
Drugman, Thomas
Gurban, Mihai
Thiran, Jean-Philippe
2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 179 - +
[9] Lip movement synthesis in audio-visual speech recognition system
Li, Junquan
Yin, Yixin
Proc. 2005 IEEE Int. Conf. on Lang. Process. Knowl. Engin. IEEE NLP-KE '05, (461-465):
[10] Analysis of lip geometric features for audio-visual speech recognition
Kaynak, MN
Zhi, Q
Cheok, AD
Sengupta, K
Han, Z
Chung, KC
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2004, 34 (04): : 564 - 570

← 1 2 3 4 5 →