Multimodal Speaker Identification Based on Text and Speech

被引：0

作者：

Moschonas, Panagiotis ^{[1
]}

Kotropoulos, Constantine ^{[1
]}

机构：

[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece

来源：

BIOMETRICS AND IDENTITY MANAGEMENT | 2008年 / 5372卷

关键词：

multimodal speaker identification; text; speech; probabilistic latent semantic indexing; Mel-frequency cepstral coefficients; nearest neighbor classifier; convex combination;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker's utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker's vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or expertise. Mel-frequency cepstral coefficients (MFCCs) axe extracted from each speech frame and their dynamic range is quantized to a number of predefined bins in order to compute MFCC local histograms for each speech utterance, which is time-aligned with the transcribed text. Two identity scores are independently computed by the PLSI applied to the text and the nearest neighbor classifier applied to the local MFCC histograms. It is demonstrated that a convex combination of the two scores is more accurate than the individual scores on speaker identification experiments conducted on broadcast news of the RT-03 MDE Training Data Text and Annotations corpus distributed by the Linguistic Data Consortium.

引用

页码：100 / 109

页数：10

共 50 条

[41] Multimodal Media Center Interface Based on Speech, Gestures and Haptic Feedback
Turunen, Markku
Hakulinen, Jaakko
Hella, Juho
Rajaniemi, Juha-Pekka
Melto, Aleksi
Makinen, Erno
Rantala, Jussi
Heimonen, Tomi
Laivo, Tuuli
Soronen, Hannu
Hansen, Mervi
Valkama, Pellervo
Miettinen, Toni
Raisamo, Roope
HUMAN-COMPUTER INTERACTION - INTERACT 2009, PT II, PROCEEDINGS, 2009, 5727 : 54 - +
[42] Multimodal Emotion Recognition Based on Facial Expressions, Speech, and Body Gestures
Yan, Jingjie
Li, Peiyuan
Du, Chengkun
Zhu, Kang
Zhou, Xiaoyang
Liu, Ying
Wei, Jinsheng
ELECTRONICS, 2024, 13 (18)
[43] Alternative Creation of Text to Speech Technology for the Albanian Language
Koshi, Blerand
Bajrami, Xhevahir
Hamiti, Mentor
IFAC PAPERSONLINE, 2016, 49 (29): : 259 - 262
[44] GlobalPhone: A Multilingual Text & Speech Database in 20 Languages
Schultz, Tanja
Ngoc Thang Vu
Schlippe, Tim
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8126 - 8130
[45] Forensic Speaker Identification: a Tutorial
Univaso, Pedro
IEEE LATIN AMERICA TRANSACTIONS, 2017, 15 (09) : 1754 - 1770
[46] Simultaneous speaker identification and watermarking
Basant S. Abd El-Wahab
Heba A. El-khobby
Mustafa M. Abd Elnaby
Fathi E. Abd El-Samie
International Journal of Speech Technology, 2021, 24 : 205 - 218
[47] Simultaneous speaker identification and watermarking
Abd El-Wahab, Basant S.
El-khobby, Heba A.
Abd Elnaby, Mustafa M.
Abd El-Samie, Fathi E.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (01) : 205 - 218
[48] Forensic speaker profiling in a Hungarian speech corpus
Beke, Andras
2018 9TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2018, : 379 - 384
[49] Limited data speaker identification
Jayanna, H. S.
Prasanna, S. R. Mahadeva
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2010, 35 (05): : 525 - 546
[50] Text independent speaker identification with finite multivariate generalised Gaussian mixture model and k-means algorithm
Sailaja, V.
Rao, K. Srinivasa
Reddy, K. V. V. S.
INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2013, 6 (02) : 119 - 126

← 1 2 3 4 5 →