Audio-visual imposture

被引：0

作者：

Karam, Walid ^{[1
]}

Mokbel, Chafic ^{[1
]}

Greige, Hanna ^{[1
]}

Chollet, Gerard ^{[2
]}

机构：

[1] Univ Balamand, Dept Comp Sci, POB 100, Tripoli, Lebanon

[2] Ecole Natl Super Telecommun Bretagne, F-75634 Paris, France

来源：

MOBILE MULTIMEDIA/IMAGE PROCESSING FOR MILITARY AND SECURITY APPLICATIONS | 2006年 / 6250卷

关键词：

speaker verification; voice transformation; active appearance models; gaussian mixture models; modality fusion; face detection; face tracking; face model; MPEG-4;

D O I：

10.1117/12.665707

中图分类号：

TB8 [摄影技术];

学科分类号：

0804 ;

摘要：

A GMM based audio visual speaker verification system is described and an Active Appearance Model with a linear speaker transformation system is used to evaluate the robustness of the verification. An Active Appearance Model (AAM) is used to automatically locate and track a speaker's face in a video recording. A Gaussian Mixture Model (GMM) based classifier (BECARS) is used for face verification. GMM training and testing is accomplished on DCT based extracted features of the detected faces. On the audio side, speech features are extracted and used for speaker verification with the GMM based classifier. Fusion of both audio and video modalities for audio visual speaker verification is compared with face verification and speaker verification systems. To improve the robustness of the multimodal biometric identity verification system, an audio visual imposture system is envisioned. It consists of an automatic voice transformation technique that an impostor may use to assume the identity of an authorized client. Features of the transformed voice are then combined with the corresponding appearance features and fed into the GMM based system BECARS for training. An attempt is made to increase the acceptance rate of the impostor and to analyzing the robustness of the verification system. Experiments are being conducted on the BANCA database, with a prospect of experimenting on the newly developed PDAtabase developed within the scope of the SecurePhone project.

引用

页数：11

共 50 条

[21] KAN-AV dataset for audio-visual face and speech analysis in the wild [J].

Kefalas, Triantafyllos ;

Fotiadou, Eftychia ;

Georgopoulos, Markos ;

Panagakis, Yannis ;

Ma, Pingchuan ;

Petridis, Stavros ;

Stafylakis, Themos ;

Pantic, Maja .

IMAGE AND VISION COMPUTING, 2023, 140

[22] Recognition of Isolated Digit Using Random Forest for Audio-Visual Speech Recognition [J].

Prashant Borde ;

Sadanand Kulkarni ;

Bharti Gawali ;

Pravin Yannawar .

Proceedings of the National Academy of Sciences, India Section A: Physical Sciences, 2022, 92 :103-110

[23] Prioritized MPEG-4 Audio-Visual Objects Streaming over the DiffServ [J].

黄天云 ;

郑婵 .

JournalofElectronicScienceandTechnologyofChina, 2005, (04) :314-320

[24] Investigation of Cross Modality Feature Fusion for Audio-Visual Dysarthric Speech Assessment [J].

Jiang, Yicong ;

Chen, Youjun ;

Wang, Tianzi ;

Jin, Zengrui ;

Xie, Xurong ;

Chen, Hui ;

Liu, Xunying ;

Tian, Feng .

2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, :141-145

[25] Content-based video parsing and indexing based on audio-visual interaction [J].

Tsekeridou, S ;

Pitas, I .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (04) :522-535

[26] THE NEW DELFT UNIVERSITY OF TECHNOLOGY DATA CORPUS FOR AUDIO-VISUAL SPEECH RECOGNITION [J].

Chitu, Alin G. ;

Rothkrantz, Leon J. M. .

EUROMEDIA'2009, 2009, :63-69

[27] Recognition of Isolated Digit Using Random Forest for Audio-Visual Speech Recognition [J].

Borde, Prashant ;

Kulkarni, Sadanand ;

Gawali, Bharti ;

Yannawar, Pravin .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES, 2022, 92 (01) :103-110

[28] Audio-visual person authentication using lip-motion from orientation maps [J].

Faraj, Maycel-Isaac ;

Bigun, Josef .

PATTERN RECOGNITION LETTERS, 2007, 28 (11) :1368-1382

[29] MPEG-4 systems: Architecting object-based audio-visual content [J].

Eleftheriadis, A .

JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2001, 27 (1-2) :55-67

[30] Speaker Localization among multi-faces in noisy environment by audio-visual Integration [J].

Kim, Hyun-Don ;

Choi, Jong-Suk ;

Kim, Munsang .

2006 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-10, 2006, :1305-1310

← 1 2 3 4 5 →