A Bayesian approach to audio-visual speaker identification

被引：0

作者：

Nefian, AV ^{[1
]}

Liang, LH

Fu, TY

Liu, XX

机构：

[1] Intel Corp, Microprocessor Res Labs, Santa Clara, CA 95051 USA

[2] Natl Tsing Hua Univ, Comp Sci & Technol Dept, Hsinchu, Taiwan

来源：

AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS | 2003年 / 2688卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we describe a text dependent audio-visual speaker identification approach that combines face recognition and audio-visual speech-based identification systems. The temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth axe modeled using a set of coupled hidden Markov models (CHMM), one for each phoneme-viseme pair and for each person in the database. The use of CHMM in our system is justified by the capability of this model to describe the natural audio and visual state asynchrony as well as their conditional dependence over time. Next, the likelihood obtained for each person in the database is combined with the face recognition likelihood obtained using an embedded hidden Markov model (EHMM). Experimental results on XM2VTS database show that our system improves the accuracy of the audio-only or video-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 5 to 30db.

引用

页码：761 / 769

页数：9

共 50 条

[41] Performance enhancement for audio-visual speaker identification using dynamic facial muscle model
Vahid Asadpour
Farzad Towhidkhah
Mohammad Mehdi Homayounpour
Medical and Biological Engineering and Computing, 2006, 44 : 919 - 930
[42] Performance enhancement for audio-visual speaker identification using dynamic facial muscle model
Asadpour, Vahid
Towhidkhah, Farzad
Homayounpour, Mohammad Mehdi
MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2006, 44 (10) : 919 - 930
[43] Bayesian separation of audio-visual speech sources
Rajaram, S
Nefian, AV
Huang, TS
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 657 - 660
[44] Integration of audio-visual information for multi-speaker multimedia speaker recognition
Yang, Jichen
Chen, Fangfan
Cheng, Yu
Lin, Pei
DIGITAL SIGNAL PROCESSING, 2024, 145
[45] A lightweight approach to real-time speaker diarization: from audio toward audio-visual data streams
Kynych, Frantisek
Cerva, Petr
Zdansky, Jindrich
Svendsen, Torbjorn
Salvi, Giampiero
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
[46] Speaker localisation using audio-visual synchrony: An empirical study
Nock, HJ
Iyengar, G
Neti, C
IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 488 - 499
[47] Active Speaker Detection Using Audio-Visual Sensor Array
Kheradiya, Jatin
Reddy, Sandeep C.
Hegde, Rajesh
2014 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2014, : 480 - 484
[48] Speaker dependent video indexing based on audio-visual interaction
Tsekeridou, S
Pitas, I
1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 358 - 362
[49] Speaker Diarization based on Audio-Visual Integration for Smart Posterboard
Wakabayashi, Yukoh
Inoue, Koji
Yoshimoto, Hiromasa
Kawahara, Tatsuya
2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
[50] Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise
Cao, Jie
Li, Jun
Li, Wei
PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 215 - 226

← 1 2 3 4 5 →