A Bayesian approach to audio-visual speaker identification

被引:0
|
作者
Nefian, AV [1 ]
Liang, LH
Fu, TY
Liu, XX
机构
[1] Intel Corp, Microprocessor Res Labs, Santa Clara, CA 95051 USA
[2] Natl Tsing Hua Univ, Comp Sci & Technol Dept, Hsinchu, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we describe a text dependent audio-visual speaker identification approach that combines face recognition and audio-visual speech-based identification systems. The temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth axe modeled using a set of coupled hidden Markov models (CHMM), one for each phoneme-viseme pair and for each person in the database. The use of CHMM in our system is justified by the capability of this model to describe the natural audio and visual state asynchrony as well as their conditional dependence over time. Next, the likelihood obtained for each person in the database is combined with the face recognition likelihood obtained using an embedded hidden Markov model (EHMM). Experimental results on XM2VTS database show that our system improves the accuracy of the audio-only or video-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 5 to 30db.
引用
收藏
页码:761 / 769
页数:9
相关论文
共 50 条
  • [1] Audio-visual bimodal speaker identification using dynamic Bayesian networks
    Wu, Zhiyong
    Cai, Lianhong
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2006, 43 (03): : 470 - 475
  • [2] Dynamic Bayesian Networks for audio-visual speaker recognition
    Li, DD
    Yang, YC
    Wu, ZH
    ADVANCES IN BIOMETRICS, PROCEEDINGS, 2006, 3832 : 539 - 545
  • [3] ENVIRONMENTALLY ROBUST AUDIO-VISUAL SPEAKER IDENTIFICATION
    Schoenherr, Lea
    Orth, Dennis
    Heckmann, Martin
    Kolossa, Dorothea
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 312 - 318
  • [4] Audio-visual biometric based speaker identification
    Kar, Biswajit
    Bhatia, Sandeep
    Dutta, P. K.
    ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL IV, PROCEEDINGS, 2007, : 94 - 98
  • [5] Audio-Visual Feature Fusion for Speaker Identification
    Almaadeed, Noor
    Aggoun, Amar
    Amira, Abbes
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 56 - 67
  • [6] A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
    Tariquzzaman, Md.
    Kim, Jin Young
    Na, Seung You
    Kim, Hyoung-Gook
    Har, Dongsoo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2052 - 2055
  • [7] Audio-visual speaker identification based on the use of dynamic audio and visual features
    Fox, N
    Reilly, RB
    AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
  • [8] Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
    Gebru, Israel D.
    Ba, Sileye
    Li, Xiaofei
    Horaud, Radu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) : 1086 - 1099
  • [9] Audio-visual speaker identification with asynchronous articulatory feature
    Chen, Yanxiang
    Liu, M.
    ELECTRONICS LETTERS, 2010, 46 (03) : 242 - U77
  • [10] Fuzzy audio-visual feature maps for speaker identification
    Chibelushi, CC
    APPLICATIONS AND SCIENCE IN SOFT COMPUTING, 2004, : 317 - 322