Performance enhancement for audio-visual speaker identification using dynamic facial muscle model

被引:0
|
作者
Vahid Asadpour
Farzad Towhidkhah
Mohammad Mehdi Homayounpour
机构
[1] Amirkabir University of Technology,Department of Biomedical Engineering
[2] Amirkabir University of Technology,Department of Computer Engineering and IT
关键词
Biometry; Speaker identification; Multistream hidden Markov models; Visual feature extraction;
D O I
暂无
中图分类号
学科分类号
摘要
Science of human identification using physiological characteristics or biometry has been of great concern in security systems. However, robust multimodal identification systems based on audio-visual information has not been thoroughly investigated yet. Therefore, the aim of this work to propose a model-based feature extraction method which employs physiological characteristics of facial muscles producing lip movements. This approach adopts the intrinsic properties of muscles such as viscosity, elasticity, and mass which are extracted from the dynamic lip model. These parameters are exclusively dependent on the neuro-muscular properties of speaker; consequently, imitation of valid speakers could be reduced to a large extent. These parameters are applied to a hidden Markov model (HMM) audio-visual identification system. In this work, a combination of audio and video features has been employed by adopting a multistream pseudo-synchronized HMM training method. Noise robust audio features such as Mel-frequency cepstral coefficients (MFCC), spectral subtraction (SS), and relative spectra perceptual linear prediction (J-RASTA-PLP) have been used to evaluate the performance of the multimodal system once efficient audio feature extraction methods have been utilized. The superior performance of the proposed system is demonstrated on a large multispeaker database of continuously spoken digits, along with a sentence that is phonetically rich. To evaluate the robustness of algorithms, some experiments were performed on genetically identical twins. Furthermore, changes in speaker voice were simulated with drug inhalation tests. In 3 dB signal to noise ratio (SNR), the dynamic muscle model improved the identification rate of the audio-visual system from 91 to 98%. Results on identical twins revealed that there was an apparent improvement on the performance for the dynamic muscle model-based system, in which the identification rate of the audio-visual system was enhanced from 87 to 96%.
引用
收藏
页码:919 / 930
页数:11
相关论文
共 50 条
  • [1] Performance enhancement for audio-visual speaker identification using dynamic facial muscle model
    Asadpour, Vahid
    Towhidkhah, Farzad
    Homayounpour, Mohammad Mehdi
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2006, 44 (10) : 919 - 930
  • [2] Audio-visual speaker identification using dynamic facial movements and utterance phonetic content
    Asadpour, Vahid
    Homayounpour, Mohammad Mehdi
    Towhidkhah, Farzad
    APPLIED SOFT COMPUTING, 2011, 11 (02) : 2083 - 2093
  • [3] Audio-visual bimodal speaker identification using dynamic Bayesian networks
    Wu, Zhiyong
    Cai, Lianhong
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2006, 43 (03): : 470 - 475
  • [4] Audio-visual speaker identification based on the use of dynamic audio and visual features
    Fox, N
    Reilly, RB
    AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
  • [5] Dynamic visual features for audio-visual speaker verification
    Dean, David
    Sridharan, Sridha
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02): : 136 - 149
  • [6] A Bayesian approach to audio-visual speaker identification
    Nefian, AV
    Liang, LH
    Fu, TY
    Liu, XX
    AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 761 - 769
  • [7] ENVIRONMENTALLY ROBUST AUDIO-VISUAL SPEAKER IDENTIFICATION
    Schoenherr, Lea
    Orth, Dennis
    Heckmann, Martin
    Kolossa, Dorothea
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 312 - 318
  • [8] Audio-visual biometric based speaker identification
    Kar, Biswajit
    Bhatia, Sandeep
    Dutta, P. K.
    ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL IV, PROCEEDINGS, 2007, : 94 - 98
  • [9] Audio-Visual Feature Fusion for Speaker Identification
    Almaadeed, Noor
    Aggoun, Amar
    Amira, Abbes
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 56 - 67
  • [10] A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
    Tariquzzaman, Md.
    Kim, Jin Young
    Na, Seung You
    Kim, Hyoung-Gook
    Har, Dongsoo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2052 - 2055