A Bayesian approach to audio-visual speaker identification

被引：0

作者：

Nefian, AV ^{[1
]}

Liang, LH

Fu, TY

Liu, XX

机构：

[1] Intel Corp, Microprocessor Res Labs, Santa Clara, CA 95051 USA

[2] Natl Tsing Hua Univ, Comp Sci & Technol Dept, Hsinchu, Taiwan

来源：

AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS | 2003年 / 2688卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we describe a text dependent audio-visual speaker identification approach that combines face recognition and audio-visual speech-based identification systems. The temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth axe modeled using a set of coupled hidden Markov models (CHMM), one for each phoneme-viseme pair and for each person in the database. The use of CHMM in our system is justified by the capability of this model to describe the natural audio and visual state asynchrony as well as their conditional dependence over time. Next, the likelihood obtained for each person in the database is combined with the face recognition likelihood obtained using an embedded hidden Markov model (EHMM). Experimental results on XM2VTS database show that our system improves the accuracy of the audio-only or video-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 5 to 30db.

引用

页码：761 / 769

页数：9

共 50 条

[1] Audio-visual bimodal speaker identification using dynamic Bayesian networks
Wu, Zhiyong
Cai, Lianhong
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2006, 43 (03): : 470 - 475
[2] Dynamic Bayesian Networks for audio-visual speaker recognition
Li, DD
Yang, YC
Wu, ZH
ADVANCES IN BIOMETRICS, PROCEEDINGS, 2006, 3832 : 539 - 545
[3] ENVIRONMENTALLY ROBUST AUDIO-VISUAL SPEAKER IDENTIFICATION
Schoenherr, Lea
Orth, Dennis
Heckmann, Martin
Kolossa, Dorothea
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 312 - 318
[4] Audio-visual biometric based speaker identification
Kar, Biswajit
Bhatia, Sandeep
Dutta, P. K.
ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL IV, PROCEEDINGS, 2007, : 94 - 98
[5] Audio-Visual Feature Fusion for Speaker Identification
Almaadeed, Noor
Aggoun, Amar
Amira, Abbes
NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 56 - 67
[6] A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
Tariquzzaman, Md.
Kim, Jin Young
Na, Seung You
Kim, Hyoung-Gook
Har, Dongsoo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2052 - 2055
[7] Audio-visual speaker identification based on the use of dynamic audio and visual features
Fox, N
Reilly, RB
AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
[8] Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
Gebru, Israel D.
Ba, Sileye
Li, Xiaofei
Horaud, Radu
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) : 1086 - 1099
[9] Audio-visual speaker identification with asynchronous articulatory feature
Chen, Yanxiang
Liu, M.
ELECTRONICS LETTERS, 2010, 46 (03) : 242 - U77
[10] Fuzzy audio-visual feature maps for speaker identification
Chibelushi, CC
APPLICATIONS AND SCIENCE IN SOFT COMPUTING, 2004, : 317 - 322

← 1 2 3 4 5 →