Text-independent speaker identification based on deep Gaussian correlation supervector

被引:0
|
作者
Linhui Sun
Ting Gu
Keli Xie
Jia Chen
机构
[1] Nanjing University of Posts and Telecommunications,College of Telecommunications & Information Engineering
[2] Ministry of Education,Key Lab of Broadband Wireless Communication and Sensor Network Technology
[3] Nanjing University of Posts and Telecommunications,undefined
来源
International Journal of Speech Technology | 2019年 / 22卷
关键词
Gaussian mixture model; Deep neural network; Speaker identification; Bottleneck feature; Deep Gaussian correlation supervector;
D O I
暂无
中图分类号
学科分类号
摘要
Great progress has been made in speaker recognition by extracting features from Gaussian mixture model (GMM) or deep neural network (DNN). In this paper, to extract the personality characteristics of speakers more accurately, we propose a novel deep Gaussian correlation supervector (DGCS) feature based on a DBN-GMM hybrid model. In the method, we firstly extract MFCC from preprocessed speech signals and employ a DBN to gain bottleneck features. Then bottleneck features are fed to a GMM to extract deep Gaussian supervector (DGS) which can be as the input of SVM achieving pattern discrimination and judgment. Further considering the relevance between deep mean vectors of DGS, DGS will be transformed to DGCS by the method of supervector recombination. Our experiments show that utilizing DGCS can significantly improve recognition rate by 17.979% compared to the system only with supervector, 18.22% compared to the system with DGS and 1.875% compared to the system with correlation supervector. In addition, the proposed DGCS demonstrates that time complexity for identification task can be largely reduced.
引用
收藏
页码:449 / 457
页数:8
相关论文
共 50 条
  • [21] Text-independent speaker identification based on selection of the most similar feature vectors
    Soleymanpour M.
    Marvi H.
    Soleymanpour, Mohammad (Soleimanpour141@gmail.com), 1600, Springer Science and Business Media, LLC (20): : 99 - 108
  • [22] A two-level classifier for text-independent speaker identification
    Hadjitodorov, S
    Boyanov, B
    Dalakchieva, N
    SPEECH COMMUNICATION, 1997, 21 (03) : 209 - 217
  • [23] Text-Independent Speaker Identification by Combining MFCC and MVA Features
    Korba, Mohamed Cherif Amara
    Bourouba, Houcine
    Rafik, Djemili
    2018 INTERNATIONAL CONFERENCE ON SIGNAL, IMAGE, VISION AND THEIR APPLICATIONS (SIVA), 2018,
  • [24] Text-independent speaker identification utilizing likelihood normalization technique
    Markov, KP
    Nakagawa, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1997, E80D (05) : 585 - 593
  • [25] HISTOGRAM TRANSFORM MODEL USING MFCC FEATURES FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION
    Yu, Hong
    Ma, Zhanyu
    Li, Minyue
    Guo, Jun
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 500 - 504
  • [26] Speaker Recognition Based on Fusion of a Deep and Shallow Recombination Gaussian Supervector
    Sun, Linhui
    Bu, Yunyi
    Zou, Bo
    Fu, Sheng
    Li, Pingan
    ELECTRONICS, 2021, 10 (01) : 1 - 21
  • [27] The Estimation and Kernel Metric of Spectral Correlation for Text-Independent Speaker Verification
    Wang, Eryu
    Lee, Kong Aik
    Ma, Bin
    Li, Haizhou
    Guo, Wu
    Dai, Lirong
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1065 - +
  • [28] Text-independent Hakka Speaker Recognition in Noisy Environments
    Peng, Jie
    Chen, Chin-Ta
    Yang, Cheng-Fu
    SENSORS AND MATERIALS, 2025, 37 (01) : 441 - 451
  • [29] Text-independent speaker identification system based on the histogram of DCT-cepstrum coefficients
    Al-Rawahy, S.
    Hossen, A.
    Heute, U.
    INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2012, 16 (03) : 141 - 161
  • [30] DEEP BOTTLENECK FEATURES FOR I-VECTOR BASED TEXT-INDEPENDENT SPEAKER VERIFICATION
    Ghalehjegh, Sina Hamidi
    Rose, Richard C.
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 555 - 560