A Bayesian approach to audio-visual speaker identification

被引:0
|
作者
Nefian, AV [1 ]
Liang, LH
Fu, TY
Liu, XX
机构
[1] Intel Corp, Microprocessor Res Labs, Santa Clara, CA 95051 USA
[2] Natl Tsing Hua Univ, Comp Sci & Technol Dept, Hsinchu, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we describe a text dependent audio-visual speaker identification approach that combines face recognition and audio-visual speech-based identification systems. The temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth axe modeled using a set of coupled hidden Markov models (CHMM), one for each phoneme-viseme pair and for each person in the database. The use of CHMM in our system is justified by the capability of this model to describe the natural audio and visual state asynchrony as well as their conditional dependence over time. Next, the likelihood obtained for each person in the database is combined with the face recognition likelihood obtained using an embedded hidden Markov model (EHMM). Experimental results on XM2VTS database show that our system improves the accuracy of the audio-only or video-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 5 to 30db.
引用
收藏
页码:761 / 769
页数:9
相关论文
共 50 条
  • [41] Performance enhancement for audio-visual speaker identification using dynamic facial muscle model
    Vahid Asadpour
    Farzad Towhidkhah
    Mohammad Mehdi Homayounpour
    Medical and Biological Engineering and Computing, 2006, 44 : 919 - 930
  • [42] Performance enhancement for audio-visual speaker identification using dynamic facial muscle model
    Asadpour, Vahid
    Towhidkhah, Farzad
    Homayounpour, Mohammad Mehdi
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2006, 44 (10) : 919 - 930
  • [43] Bayesian separation of audio-visual speech sources
    Rajaram, S
    Nefian, AV
    Huang, TS
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 657 - 660
  • [44] Integration of audio-visual information for multi-speaker multimedia speaker recognition
    Yang, Jichen
    Chen, Fangfan
    Cheng, Yu
    Lin, Pei
    DIGITAL SIGNAL PROCESSING, 2024, 145
  • [45] A lightweight approach to real-time speaker diarization: from audio toward audio-visual data streams
    Kynych, Frantisek
    Cerva, Petr
    Zdansky, Jindrich
    Svendsen, Torbjorn
    Salvi, Giampiero
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [46] Speaker localisation using audio-visual synchrony: An empirical study
    Nock, HJ
    Iyengar, G
    Neti, C
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 488 - 499
  • [47] Active Speaker Detection Using Audio-Visual Sensor Array
    Kheradiya, Jatin
    Reddy, Sandeep C.
    Hegde, Rajesh
    2014 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2014, : 480 - 484
  • [48] Speaker dependent video indexing based on audio-visual interaction
    Tsekeridou, S
    Pitas, I
    1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 358 - 362
  • [49] Speaker Diarization based on Audio-Visual Integration for Smart Posterboard
    Wakabayashi, Yukoh
    Inoue, Koji
    Yoshimoto, Hiromasa
    Kawahara, Tatsuya
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [50] Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise
    Cao, Jie
    Li, Jun
    Li, Wei
    PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 215 - 226