Multimodal Speaker Identification Based on Text and Speech

被引:0
|
作者
Moschonas, Panagiotis [1 ]
Kotropoulos, Constantine [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
来源
BIOMETRICS AND IDENTITY MANAGEMENT | 2008年 / 5372卷
关键词
multimodal speaker identification; text; speech; probabilistic latent semantic indexing; Mel-frequency cepstral coefficients; nearest neighbor classifier; convex combination;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker's utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker's vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or expertise. Mel-frequency cepstral coefficients (MFCCs) axe extracted from each speech frame and their dynamic range is quantized to a number of predefined bins in order to compute MFCC local histograms for each speech utterance, which is time-aligned with the transcribed text. Two identity scores are independently computed by the PLSI applied to the text and the nearest neighbor classifier applied to the local MFCC histograms. It is demonstrated that a convex combination of the two scores is more accurate than the individual scores on speaker identification experiments conducted on broadcast news of the RT-03 MDE Training Data Text and Annotations corpus distributed by the Linguistic Data Consortium.
引用
收藏
页码:100 / 109
页数:10
相关论文
共 50 条
  • [21] An MFCC-based Speaker Identification System
    Leu, Fang-Yie
    Lin, Guan-Liang
    2017 IEEE 31ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2017, : 1055 - 1062
  • [22] Text-Independent Emirati-Accented Speaker Identification in Emotional Talking Environment
    Shahin, Ismail
    2018 FIFTH HCT INFORMATION TECHNOLOGY TRENDS (ITT): EMERGING TECHNOLOGIES FOR ARTIFICIAL INTELLIGENCE, 2018, : 257 - 262
  • [23] HISTOGRAM TRANSFORM MODEL USING MFCC FEATURES FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION
    Yu, Hong
    Ma, Zhanyu
    Li, Minyue
    Guo, Jun
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 500 - 504
  • [24] An Efficient Text-Independent Speaker Identification Using Feature Fusion and Transformer Model
    Khan, Arfat Ahmad
    Jahangir, Rashid
    Alroobaea, Roobaea
    Alyahyan, Saleh Yahya
    Almulhi, Ahmed H.
    Alsafyani, Majed
    Wechtaisong, Chitapong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 4085 - 4100
  • [25] An Investigation on the Accuracy of Truncated DKLT Representation for Speaker Identification With Short Sequences of Speech Frames
    Biagetti, Giorgio
    Crippa, Paolo
    Falaschetti, Laura
    Orcioni, Simone
    Turchetti, Claudio
    IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (12) : 4235 - 4249
  • [26] Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
    Islam, Md Shofiqul
    Hasan, Khondokar Fida
    Shajeeb, Hasibul Hossain
    Rana, Humayan Kabir
    Rahman, Md. Saifur
    Hasan, Md. Munirul
    Azad, A. K. M.
    Abdullah, Ibrahim
    Moni, Mohammad Ali
    AI OPEN, 2025, 6 : 12 - 44
  • [27] COLOMBIAN DIALECT RECOGNITION BASED ON INFORMATION EXTRACTED FROM SPEECH AND TEXT SIGNALS
    Escobar-Grisales, D.
    Rios-Urrego, C. D.
    Lopez-Santander, D. A.
    Gallo-Aristizabal, J. D.
    Vasquez-Correa, J. C.
    Noeth, E.
    Orozco-Arroyave, J. R.
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 556 - 563
  • [28] SPEAKING, TEXT AND SPEECH, A BRIEF APPROACH
    de Oliveira, Vanderley Jose
    Ternes, Jose
    HUMANIDADES & INOVACAO, 2020, 7 (18): : 59 - 67
  • [29] Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG
    Pan, Jiahui
    Fang, Weijie
    Zhang, Zhihang
    Chen, Bingzhi
    Zhang, Zheng
    Wang, Shuihua
    IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY, 2024, 5 : 396 - 403
  • [30] An Approach to Speaker Identification
    Hollien, Harry
    JOURNAL OF FORENSIC SCIENCES, 2016, 61 (02) : 334 - 344