Text-Independent Speaker Recognition System Using Feature-Level Fusion for Audio Databases of Various Sizes

被引:0
|
作者
Chauhan N. [1 ]
Isshiki T. [1 ]
Li D. [1 ]
机构
[1] Department of Information and Communication Engineering, Tokyo Institute of Technology, Tokyo
关键词
Biometric system; Feature-level fusion; SI accuracy; Speaker recognition; SV EER;
D O I
10.1007/s42979-023-02056-w
中图分类号
学科分类号
摘要
To improve the speaker recognition rate, we propose a speaker recognition model based on the fusion of different kinds of speech features. A new type of feature aggregation methodology with a total of 18 features is proposed and includes mel frequency cepstral coefficient (MFCC), linear predictive coding (LPC), perceptual linear prediction (PLP), root mean square (RMS), centroid, and entropy features along with their delta (Δ) and delta–delta (ΔΔ) feature vectors. The proposed approach is tested on five different sizes of speech datasets, namely the NIST-2008, voxforge, ELSDSR, VCTK, and voxceleb1 speech corpora. The results are evaluated using the MATLAB classification learner application with the linear discriminant (LD), K nearest neighbor (KNN), and ensemble classifiers. For the NIST-2008 and voxforge datasets, the best SI accuracy of 96.9% and 100% and the lowest speaker verification (SV) equal error rate (EER) values of 0.2% and 0% are achieved with the LD and KNN classifiers, respectively. For the VCTK and ELSDSR datasets, the best SI accuracy of 100% and the lowest SV EER of 0% are achieved with all three classifiers using different feature-level fusion approaches, while the highest SI accuracy and lowest EER achieved on the voxceleb1 database are 90% and 4.07%, respectively, using the KNN classifier. From the experimental results, it is observed that the fusion of different features with their delta and delta–delta values shows an increase in speaker identification accuracy of 10–50%, and the EER value for SV is reduced compared to the value obtained with a single feature. © 2023, The Author(s).
引用
收藏
相关论文
共 50 条
  • [21] Improvement of Text-Independent Speaker Verification Using Gender-like Feature
    Kiawjak, Pornprom
    Wangsiripitak, Somkiat
    Pasupa, Kitsuchart
    2021 13TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST-2021), 2021, : 219 - 224
  • [22] FEATURE SELECTION USING ADAPTIVE LEARNING NETWORKS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    CHEUNG, RS
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S183 - S183
  • [23] TEXT-INDEPENDENT SPEAKER RECOGNITION USING TWO-DIMENSIONAL INFORMATION ENTROPY
    Bozilovic, Bosko
    Todorovic, Branislav M.
    Obradovic, Miroslav
    JOURNAL OF ELECTRICAL ENGINEERING-ELEKTROTECHNICKY CASOPIS, 2015, 66 (03): : 169 - 173
  • [24] Fusion of a complementary feature set with MFCC for improved closed set text-independent speaker identification
    Chakroborty, Sandipan
    Roy, Anindya
    Saha, Goutam
    2006 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1-6, 2006, : 2914 - +
  • [25] Research on text-independent speaker recognition methods using wavelet neural network
    Bai, Ying
    Zhao, Zhen-Dong
    Qi, Yin-Cheng
    Wang, Bin
    Guo, Jian-Yong
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2006, 28 (06): : 1036 - 1039
  • [26] Text-independent speaker recognition using LSTM-RNN and speech enhancement
    Abd El-Moneim, Samia
    Nassar, M. A.
    Dessouky, Moawad I.
    Ismail, Nabil A.
    El-Fishawy, Adel S.
    Abd El-Samie, Fathi E.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (33-34) : 24013 - 24028
  • [27] Text-independent speaker recognition using LSTM-RNN and speech enhancement
    Samia Abd El-Moneim
    M. A. Nassar
    Moawad I. Dessouky
    Nabil A. Ismail
    Adel S. El-Fishawy
    Fathi E. Abd El-Samie
    Multimedia Tools and Applications, 2020, 79 : 24013 - 24028
  • [28] Text-independent Speaker Recognition Based on One Third Octave Feature and Grey Relational Analysis
    Zhu Jianmin
    Zhang Lei
    Zhai Dongting
    Huang Zhiwen
    Wang Jun
    JOURNAL OF GREY SYSTEM, 2012, 24 (04): : 347 - 358
  • [29] Text-independent speaker verification using utterance level scoring and covariance modeling
    Zilca, RD
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (06): : 363 - 370
  • [30] A text-independent speaker identification system using PARCOR and AR model
    Liu, CH
    Chen, OTC
    2002 45TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL III, CONFERENCE PROCEEDINGS, 2002, : 332 - 335