Audio-visual biometric based speaker identification

被引：1

作者：

Kar, Biswajit ^{[1
]}

Bhatia, Sandeep ^{[1
]}

Dutta, P. K. ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Elect Engn, Kharagpur 721302, W Bengal, India

来源：

ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL IV, PROCEEDINGS | 2007年

关键词：

biometrics; speaker recognition; speaker model; audio visual speech recognition;

D O I：

10.1109/ICCIMA.2007.21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present a multimodal audio-visual speaker identification system. The proposed system decomposes the information existing in a video stream into two components: speech and lip motion. It has been studied that lip information not only presents speech information but also characteristic information about a person's identity. Fusing this information with speech information will produce robust person identification tinder adverse condition. Gaussian mixture models (GMMs) and Hidden markov models (HMMs) are used throughout this work for the tasks of text dependent speaker recognition and month tracking. The performance is evaluated for dataset of 22 Indian of different ethnicity speakers each tittering a sentence. The results show that the performance of the biometric system is significantly better when both audio and video features are used.

引用

页码：94 / 98

页数：5

共 11 条

[1] Face segmentation using skin-color map in videophone applications [J].

Chai, D ;

Ngan, KN .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1999, 9 (04) :551-564

[2]

CHILBELUSHI CC, 2002, IEEE T MULTIMEDIA, V4, P23

[3]

GLOTIN H, 2001, P INT C AC SPEECH SI

[4] An introduction to biometric recognition [J].

Jain, AK ;

Ross, A ;

Prabhakar, S .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2004, 14 (01) :4-20

[5]

JAVIER OG, 2004, IEEE SIGNAL PROCESSI, P50

[6] Integration strategies for audio-visual speech processing: Applied to text-dependent speaker recognition [J].

Lucey, S ;

Chen, TH ;

Sridharan, S ;

Chandran, V .

IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (03) :495-506

[7] Modelling facial colour and identity with Gaussian mixtures [J].

McKenna, SJ ;

Gong, SG ;

Raja, Y .

PATTERN RECOGNITION, 1998, 31 (12) :1883-1892

[8] ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS [J].

REYNOLDS, DA ;

ROSE, RC .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01) :72-83

[9]

Ross A, 2001, LECT NOTES COMPUT SC, V2091, P354

[10] Detecting faces in images: A survey [J].

Yang, MH ;

Kriegman, DJ ;

Ahuja, N .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (01) :34-58

← 1 2 →