Multimodal Speaker Identification Based on Text and Speech

被引:0
|
作者
Moschonas, Panagiotis [1 ]
Kotropoulos, Constantine [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
来源
BIOMETRICS AND IDENTITY MANAGEMENT | 2008年 / 5372卷
关键词
multimodal speaker identification; text; speech; probabilistic latent semantic indexing; Mel-frequency cepstral coefficients; nearest neighbor classifier; convex combination;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker's utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker's vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or expertise. Mel-frequency cepstral coefficients (MFCCs) axe extracted from each speech frame and their dynamic range is quantized to a number of predefined bins in order to compute MFCC local histograms for each speech utterance, which is time-aligned with the transcribed text. Two identity scores are independently computed by the PLSI applied to the text and the nearest neighbor classifier applied to the local MFCC histograms. It is demonstrated that a convex combination of the two scores is more accurate than the individual scores on speaker identification experiments conducted on broadcast news of the RT-03 MDE Training Data Text and Annotations corpus distributed by the Linguistic Data Consortium.
引用
收藏
页码:100 / 109
页数:10
相关论文
共 50 条
  • [1] PSO Based Optimized Reliability for Robust Multimodal Speaker Identification
    Tariquzzaman, Md.
    Kim, Jin Young
    Na, Seung You
    CISST'10: PROCEEDINGS OF THE 4TH WSEAS INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, SIGNAL AND TELECOMMUNICATIONS, 2009, : 157 - 162
  • [2] Spectral Restoration Based Speech Enhancement for Robust Speaker Identification
    Saleem, Nasir
    Tareen, Tayyaba Gul
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2018, 5 (01): : 34 - 39
  • [3] Speaker identification utilizing noncontemporary speech
    Hollien, H
    Schwartz, R
    JOURNAL OF FORENSIC SCIENCES, 2001, 46 (01) : 63 - 67
  • [4] Multimodal speaker identification using an adaptive classifier cascade based on modality reliability
    Erzin, E
    Yemez, Y
    Tekalp, AM
    IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (05) : 840 - 852
  • [5] Text-independent Speaker Identification in Birds
    Fox, E. J. S.
    Roberts, J. D.
    Bennamoun, M.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2122 - 2125
  • [6] Continuous Speech Recognition and Identification of the Speaker System
    Guffanti, Diego
    Martinez, Danilo
    Paladines, Jose
    Sarmiento, Andrea
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY & SYSTEMS (ICITS 2018), 2018, 721 : 767 - 776
  • [7] Extraction of audio features specific to speech production for multimodal speaker detection
    Besson, Patricia
    Popovici, Vlad
    Vesin, Jean-Marc
    Thiran, Jean-Philippe
    Kunt, Murat
    IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (01) : 63 - 73
  • [8] An MFCC-based text-independent speaker identification system for access control
    Liu, Jung-Chun
    Leu, Fang-Yie
    Lin, Guan-Liang
    Susanto, Heru
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (02):
  • [9] Higher order information set based features for text-independent speaker identification
    Medikonda J.
    Madasu H.
    International Journal of Speech Technology, 2018, 21 (03) : 451 - 461
  • [10] Text-Independent Speaker Identification Using Vowel Formants
    Noor Almaadeed
    Amar Aggoun
    Abbes Amira
    Journal of Signal Processing Systems, 2016, 82 : 345 - 356