Scale-invariant MFCCs for speech/speaker recognition

被引:1
|
作者
Tufekci, Zekeriya [1 ]
Disken, Gokay [2 ]
机构
[1] Cukurova Univ, Fac Engn, Dept Comp Engn, Adana, Turkey
[2] Adana Sci & Technol Univ, Fac Engn, Dept Elect & Elect Engn, Adana, Turkey
关键词
Feature extraction; speaker recognition; speech recognition; SPEECH;
D O I
10.3906/elk-1901-231
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The feature extraction process is a fundamental part of speech processing. Mel frequency cepstral coefficients (MFCCs) are the most commonly used feature types in the speech/speaker recognition literature. However, the MFCC framework may face numerical issues or dynamic range problems, which decreases their performance. A practical solution to these problems is adding a constant to filter-bank magnitudes before log compression, thus violating the scale-invariant property. In this work, a magnitude normalization and a multiplication constant are introduced to make the MFCCs scale-invariant and to avoid dynamic range expansion of nonspeech frames. Speaker verification experiments are conducted to show the effectiveness of the proposed scheme.
引用
收藏
页码:3758 / 3762
页数:5
相关论文
共 50 条
  • [1] Speaker recognition via fusion of subglottal features and MFCCs
    Arsikere, Harish
    Gupta, Hitesh Anand
    Alwan, Abeer
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1106 - 1110
  • [2] Speech Recognition Combining MFCCs and Image Features
    Karlos, Stamatis
    Fazakis, Nikos
    Karanikola, Katerina
    Kotsiantis, Sotiris
    Sgarbas, Kyriakos
    SPEECH AND COMPUTER, 2016, 9811 : 651 - 658
  • [3] Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition
    Lu, Cheng
    Zong, Yuan
    Zheng, Wenming
    Li, Yang
    Tang, Chuangao
    Schuller, Bjoern W.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2217 - 2230
  • [4] Contemporary Speech/Speaker Recognition with Speech from Impaired Vocal Apparatus
    Nidhyananthan, S. Selva
    Selvakumari, R. Shantha
    Shenbagalakshmi, V.
    2014 INTERNATIONAL CONFERENCE ON COMMUNICATION AND NETWORK TECHNOLOGIES (ICCNT), 2014, : 198 - 202
  • [5] Speech recognition as feature extraction for speaker recognition
    Stolcke, A.
    Shriberg, E.
    Ferrer, L.
    Kajarekar, S.
    Sonmez, K.
    Tur, G.
    2007 IEEE WORKSHOP ON SIGNAL PROCESSING APPLICATIONS FOR PUBLIC SECURITY AND FORENSICS, 2007, : 39 - +
  • [6] Scale-Invariant Representation of Light Field Images for Object Recognition and Tracking
    Ghasemi, Alireza
    Vetterli, Andmartin
    COMPUTATIONAL IMAGING XII, 2014, 9020
  • [7] Learnable MFCCs for Speaker Verification
    Liu, Xuechen
    Sahidullah, Md
    Kinnunen, Tomi
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [8] RobinNet: A Multimodal Speech Emotion Recognition System With Speaker Recognition for Social Interactions
    Khurana, Yash
    Gupta, Swamita
    Sathyaraj, R.
    Raja, S. P.
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 11 (01) : 478 - 487
  • [9] Contextual invariant-integration features for improved speaker-independent speech recognition
    Mueller, Florian
    Mertins, Alfred
    SPEECH COMMUNICATION, 2011, 53 (06) : 830 - 841
  • [10] Search in speech, language identification and speaker recognition in Speech@FIT
    Cernocky, Jan
    Burget, Lukas
    Schwarz, Petr
    Matejka, Pavel
    Karafiat, Martin
    Glembek, Ondrej
    Kopecky, Jiri
    Szoeke, Igor
    Fapso, Michal
    Grezl, Frantisek
    Hubeika, Valiantsina
    Oparin, Ilya
    2007 17TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, VOLS 1 AND 2, 2007, : 132 - +