Visual speech feature extraction for improved speech recognition

被引:0
|
作者
Zhang, X [1 ]
Mersereau, RM [1 ]
Clements, M [1 ]
Broun, CC [1 ]
机构
[1] Georgia Inst Technol, Ctr Signal & Image Proc, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Mainstream automatic speech recognition has focused almost exclusively on the acoustic signal. The performance of these systems degrades considerably in the real world in the presence of noise. On the other hand, most human listeners, both hearing-impaired and normal hearing, make use of visual information to improve speech perception in acoustically hostile environments. Motivated by humans' ability to lipread, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved accuracy over totally acoustic systems, especially in noisy environments. In this paper, we investigate the usefulness of visual information in speech recognition. We first present a method for automatically locating and extracting visual speech features from a talking person in color video sequences. We then develop a recognition engine to train and recognize sequences of visual parameters for the purpose of speech recognition. We particularly explore the impact of various combinations of visual features on the recognition accuracy. We conclude that the inner lip contour features together with the information about the visibility of the tongue and teeth significantly improve the perfon-nance over using outer contour only features in both speaker dependent and speaker independent recognition tasks.
引用
收藏
页码:1993 / 1996
页数:4
相关论文
共 50 条
  • [1] Improved Lip Contour Extraction For Visual Speech Recognition
    Chalamala, Srinivasa Rao
    Gudla, Balakrishna
    Yegnanarayana, B.
    Sheela, Anitha K.
    2015 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2015, : 459 - 462
  • [2] Study of different feature extraction method for visual speech recognition
    Debnath, Saswati
    Roy, Pinki
    Justin, Vijin
    Naik, Shradha
    2021 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2021,
  • [3] Automatic Visual Feature Extraction for Mandarin Audio-Visual Speech Recognition
    Pao, Tsang-Long
    Liao, Wen-Yuan
    Wu, Tsan-Nung
    Lin, Ching-Yi
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 2936 - 2940
  • [4] A HYBRID VISUAL FEATURE EXTRACTION METHOD FOR AUDIO-VISUAL SPEECH RECOGNITION
    Wu, Guanyong
    Zhu, Jie
    Xu, Haihua
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 1829 - 1832
  • [5] Visual Feature Extraction for Isolated Word Visual Only Speech Recognition of Vietnamese
    Nguyen Thien Chuong
    Chaloupka, Josef
    2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 459 - 463
  • [6] Speech recognition as feature extraction for speaker recognition
    Stolcke, A.
    Shriberg, E.
    Ferrer, L.
    Kajarekar, S.
    Sonmez, K.
    Tur, G.
    2007 IEEE WORKSHOP ON SIGNAL PROCESSING APPLICATIONS FOR PUBLIC SECURITY AND FORENSICS, 2007, : 39 - +
  • [7] Research on Visual Speech Feature Extraction
    He Jun
    Zhang Hua
    2009 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND TECHNOLOGY, VOL II, PROCEEDINGS, 2009, : 499 - 502
  • [8] Optimizing feature extraction for speech recognition
    Lee, CH
    Hyun, DH
    Choi, ES
    Go, JW
    Lee, CY
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (01): : 80 - 87
  • [9] Feature extraction for robust speech recognition
    Dharanipragada, S
    2002 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, PROCEEDINGS, 2002, : 855 - 858
  • [10] Visual Speech Recognition: a solution from feature extraction to words classification
    Da Silveira, L
    Facon, J
    Borges, DL
    XVI BRAZILIAN SYMPOSIUM ON COMPUTER GRAPHICS AND IMAGE PROCESSING, PROCEEDINGS, 2003, : 399 - 405