Efficient speaker identification using spectral entropy

被引:7
|
作者
Luque-Suarez, Fernando [1 ]
Camarena-Ibarrola, Antonio [2 ]
Chavez, Edgar [1 ]
机构
[1] CICESE, Ensenada, Baja California, Mexico
[2] Univ Michoacana, Morelia, Michoacan, Mexico
关键词
Speaker recognition; Speaker identification; Entropygrams; RECOGNITION;
D O I
10.1007/s11042-018-7035-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In voice recognition, the two main problems are speech recognition (what was said), and speaker recognition (who was speaking). The usual method for speaker recognition is to postulate a model where the speaker identity corresponds to the parameters of the model, which estimation could be time-consuming when the number of candidate speakers is large. In this paper, we model the speaker as a high dimensional point cloud of entropy-based features, extracted from the speech signal. The method allows indexing, and hence it can manage large databases. We experimentally assessed the quality of the identification with a publicly available database formed by extracting audio from a collection of YouTube videos of 1,000 different speakers. With 20 second audio excerpts, we were able to identify a speaker with 97% accuracy when the recording environment is not controlled, and with 99% accuracy for controlled recording environments.
引用
收藏
页码:16803 / 16815
页数:13
相关论文
共 50 条
  • [31] Speaker Recognition using Spectral Dimension Features
    Chen, Wen-Shiung
    Huang, Jr-Feng
    2009 FOURTH INTERNATIONAL MULTI-CONFERENCE ON COMPUTING IN THE GLOBAL INFORMATION TECHNOLOGY (ICCGI 2009), 2009, : 132 - 137
  • [32] Speaker Identification Using Bagging Techniques
    Indumathi, A.
    Chandra, E.
    2015 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, AND SYSTEMS (ICCCS), 2015, : 223 - 229
  • [33] Video classification using speaker identification
    Patel, NV
    Sethi, IK
    STORAGE AND RETRIEVAL FOR IMAGE AND VIDEO DATABASES V, 1997, 3022 : 218 - 225
  • [34] Text-Independent Speaker Identification Using Vowel Formants
    Almaadeed, Noor
    Aggoun, Amar
    Amira, Abbes
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (03): : 345 - 356
  • [35] Emotional speaker identification using a novel capsule nets model
    Nassif, Ali Bou
    Shahin, Ismail
    Elnagar, Ashraf
    Velayudhan, Divya
    Alhudhaif, Adi
    Polat, Kemal
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 193
  • [36] Simple and Efficient Speaker Comparison using Approximate KL Divergence
    Campbell, W. M.
    Karam, Z. N.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 362 - 365
  • [37] Speaker identification using fuzzy i-vector tree
    Galka, Jakub
    Jaciow, Pawel
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (04) : 4937 - 4949
  • [38] HANDS-FREE SPEAKER IDENTIFICATION BASED ON SPECTRAL SUB TRACTION USING A MULTI-CHANNEL LEAST MEAN SQUARE APPROACH
    Wang, Longbiao
    Zhang, Zhaofeng
    Kai, Atsuhiko
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7224 - 7228
  • [39] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
    M. Milošević
    Ž. Nedeljković
    U. Glavitsch
    Ž. Đurović
    Journal of Communications Technology and Electronics, 2019, 64 : 1256 - 1265
  • [40] A Novel Speech Enhancement Method Using Fourier Series Decomposition and Spectral Subtraction for Robust Speaker Identification
    Siam, Ali, I
    El-khobby, Heba A.
    Abd Elnaby, Mustafa M.
    Abdelkader, Hatem S.
    Abd El-Samie, Fathi E.
    WIRELESS PERSONAL COMMUNICATIONS, 2019, 108 (02) : 1055 - 1068