Multimodal Speaker Identification Based on Text and Speech

被引:0
|
作者
Moschonas, Panagiotis [1 ]
Kotropoulos, Constantine [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
来源
BIOMETRICS AND IDENTITY MANAGEMENT | 2008年 / 5372卷
关键词
multimodal speaker identification; text; speech; probabilistic latent semantic indexing; Mel-frequency cepstral coefficients; nearest neighbor classifier; convex combination;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker's utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker's vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or expertise. Mel-frequency cepstral coefficients (MFCCs) axe extracted from each speech frame and their dynamic range is quantized to a number of predefined bins in order to compute MFCC local histograms for each speech utterance, which is time-aligned with the transcribed text. Two identity scores are independently computed by the PLSI applied to the text and the nearest neighbor classifier applied to the local MFCC histograms. It is demonstrated that a convex combination of the two scores is more accurate than the individual scores on speaker identification experiments conducted on broadcast news of the RT-03 MDE Training Data Text and Annotations corpus distributed by the Linguistic Data Consortium.
引用
收藏
页码:100 / 109
页数:10
相关论文
共 50 条
  • [31] Emotion detection from text and speech: a survey
    Sailunaz K.
    Dhaliwal M.
    Rokne J.
    Alhajj R.
    Social Network Analysis and Mining, 2018, 8 (1)
  • [32] Cancelable speaker identification based on cepstral coefficients and comb filters
    Monir M.
    Kareem M.
    El-Dolil S.M.
    Saleeb A.
    El-Fishawy A.S.
    Nassar M.A.-E.
    Zein Eldin M.A.
    Abd El-Samie F.E.
    Int J Speech Technol, 2 (471-492): : 471 - 492
  • [33] Speaker Identification based on Robust AM-FM Features
    Deshpande, Mangesh S.
    Holambe, Raghunath S.
    2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 62 - +
  • [34] Can AI Powered Speech-to-Text and Text-to-Speech techniques limit the interviewer bias in sensory and consumer research?
    Kreuzen, Hester
    Dull, Danielle
    de Rover, Vera
    Span, Rignald
    FOOD QUALITY AND PREFERENCE, 2023, 107
  • [35] Text-independent speaker identification system using discrete wavelet transform with linear prediction coding
    Othman Alrusaini
    Khaled Daqrouq
    Journal of Umm Al-Qura University for Engineering and Architecture, 2024, 15 (2): : 112 - 119
  • [36] Text Independent Speaker Identification using Integrated Independent Component Analysis with Generalized Gaussian Mixture Model
    Ramaligeswararao, N. M.
    Sailaja, V.
    Rao, K. Srinivasa
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2011, 2 (12) : 85 - 91
  • [37] Speech Perception as a Multimodal Phenomenon
    Rosenblum, Lawrence D.
    CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE, 2008, 17 (06) : 405 - 409
  • [38] A Study on Turkish Text - Dependent Speaker Recognition
    Celiktas, Havva
    Hanilci, Cemal
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [39] Analysis and Investigation of Speaker Identification Problems Using Deep Learning Networks and the YOHO English Speech Dataset
    Almarshady, Nourah M.
    Alashban, Adal A.
    Alotaibi, Yousef A.
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [40] Multimodal Media Center Interface Based on Speech, Gestures and Haptic Feedback
    Turunen, Markku
    Hakulinen, Jaakko
    Hella, Juho
    Rajaniemi, Juha-Pekka
    Melto, Aleksi
    Makinen, Erno
    Rantala, Jussi
    Heimonen, Tomi
    Laivo, Tuuli
    Soronen, Hannu
    Hansen, Mervi
    Valkama, Pellervo
    Miettinen, Toni
    Raisamo, Roope
    HUMAN-COMPUTER INTERACTION - INTERACT 2009, PT II, PROCEEDINGS, 2009, 5727 : 54 - +