A Bayesian approach to audio-visual speaker identification

被引：0

作者：

Nefian, AV ^{[1
]}

Liang, LH

Fu, TY

Liu, XX

机构：

[1] Intel Corp, Microprocessor Res Labs, Santa Clara, CA 95051 USA

[2] Natl Tsing Hua Univ, Comp Sci & Technol Dept, Hsinchu, Taiwan

来源：

AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS | 2003年 / 2688卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we describe a text dependent audio-visual speaker identification approach that combines face recognition and audio-visual speech-based identification systems. The temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth axe modeled using a set of coupled hidden Markov models (CHMM), one for each phoneme-viseme pair and for each person in the database. The use of CHMM in our system is justified by the capability of this model to describe the natural audio and visual state asynchrony as well as their conditional dependence over time. Next, the likelihood obtained for each person in the database is combined with the face recognition likelihood obtained using an embedded hidden Markov model (EHMM). Experimental results on XM2VTS database show that our system improves the accuracy of the audio-only or video-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 5 to 30db.

引用

页码：761 / 769

页数：9

共 50 条

[31] Dynamic dependency tests for audio-visual speaker association
Siracusa, Michael R.
Fisher, John W., III
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 457 - +
[32] Audio-visual speaker recognition for video broadcast news
Maison, B
Neti, C
Senior, A
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2001, 29 (1-2): : 71 - 79
[33] Audio-visual speaker tracking with importance particle filters
Gatica-Perez, D
Lathoud, G
McCowan, I
Odobez, JM
Moore, D
2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 25 - 28
[34] Audio-Visual Multilevel Fusion for Speech and Speaker Recognition
Chetty, Girija
Wagner, Michael
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 379 - 382
[35] Audio-Visual Speaker Recognition for Video Broadcast News
Benoît Maison
Chalapathy Neti
Andrew Senior
Journal of VLSI signal processing systems for signal, image and video technology, 2001, 29 : 71 - 79
[36] Target Active Speaker Detection with Audio-visual Cues
Jiang, Yidi
Tao, Ruijie
Pan, Zexu
Li, Haizhou
INTERSPEECH 2023, 2023, : 3152 - 3156
[37] RETHINKING AUDIO-VISUAL SYNCHRONIZATION FOR ACTIVE SPEAKER DETECTION
Wuerkaixi, Abudukelimu
Zhang, You
Duan, Zhiyao
Zhang, Changshui
2022 IEEE 32ND INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2022,
[38] Audio-visual speaker identification using dynamic facial movements and utterance phonetic content
Asadpour, Vahid
Homayounpour, Mohammad Mehdi
Towhidkhah, Farzad
APPLIED SOFT COMPUTING, 2011, 11 (02) : 2083 - 2093
[39] Weight estimation for audio-visual multi-level fusion in bimodal speaker identification
Wu, Zhiyong
Cai, Lianhong
Meng, Helen M.
INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 1107 - 1112
[40] A JOINT AUDIO-VISUAL APPROACH TO AUDIO LOCALIZATION
Jensen, Jesper Rindom
Christensen, Mads Graesboll
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 454 - 458

← 1 2 3 4 5 →