A Bayesian approach to audio-visual speaker identification

被引:0
|
作者
Nefian, AV [1 ]
Liang, LH
Fu, TY
Liu, XX
机构
[1] Intel Corp, Microprocessor Res Labs, Santa Clara, CA 95051 USA
[2] Natl Tsing Hua Univ, Comp Sci & Technol Dept, Hsinchu, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we describe a text dependent audio-visual speaker identification approach that combines face recognition and audio-visual speech-based identification systems. The temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth axe modeled using a set of coupled hidden Markov models (CHMM), one for each phoneme-viseme pair and for each person in the database. The use of CHMM in our system is justified by the capability of this model to describe the natural audio and visual state asynchrony as well as their conditional dependence over time. Next, the likelihood obtained for each person in the database is combined with the face recognition likelihood obtained using an embedded hidden Markov model (EHMM). Experimental results on XM2VTS database show that our system improves the accuracy of the audio-only or video-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 5 to 30db.
引用
收藏
页码:761 / 769
页数:9
相关论文
共 50 条
  • [21] Multifactor fusion for audio-visual speaker recognition
    Chetty, Girija
    Tran, Dat
    LECTURE NOTES IN SIGNAL SCIENCE, INTERNET AND EDUCATION (SSIP'07/MIV'07/DIWEB'07), 2007, : 70 - +
  • [22] Audio-visual system for robust speaker recognition
    Chen, Q
    Yang, JG
    Gou, J
    MLMTA '05: Proceedings of the International Conference on Machine Learning Models Technologies and Applications, 2005, : 97 - 103
  • [23] AUDIO-VISUAL SPEAKER IDENTIFICATION WITH MULTI-VIEW DISTANCE METRIC LEARNING
    Zheng, Haomian
    Wang, Meng
    Li, Zhu
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 4561 - 4564
  • [24] The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear
    Fecher, Natalie
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2247 - 2250
  • [25] AVA ACTIVE SPEAKER: AN AUDIO-VISUAL DATASET FOR ACTIVE SPEAKER DETECTION
    Roth, Joseph
    Chaudhuri, Sourish
    Klejch, Ondrej
    Marvin, Radhika
    Gallagher, Andrew
    Kaver, Liat
    Ramaswamy, Sharadh
    Stopczynski, Arkadiusz
    Schmid, Cordelia
    Xi, Zhonghua
    Pantofaru, Caroline
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4492 - 4496
  • [26] Speaker and digit recognition by audio-visual lip biometrics
    Faraj, Maycel Isaac
    Bigun, Josef
    ADVANCES IN BIOMETRICS, PROCEEDINGS, 2007, 4642 : 1016 - +
  • [27] Audio-Visual Speech Recognition in the Presence of a Competing Speaker
    Shao, Xu
    Barker, Jon
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1292 - 1295
  • [28] AUDIO-VISUAL SPEAKER LOCALIZATION VIA WEIGHTED CLUSTERING
    Gebru, Israel D.
    Alameda-Pineda, Xavier
    Horaud, Radu
    Forbes, Florence
    2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2014,
  • [29] Speaker independent audio-visual continuous speech recognition
    Liang, LH
    Liu, XX
    Zhao, YB
    Pi, XB
    Nefian, AV
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A25 - A28
  • [30] Audio-visual speaker localization using graphical models
    Kushal, Akash
    Rahurkar, Mandar
    Li Fei-Fei
    Ponce, Jean
    Huang, Thomas
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 291 - +