Scale-invariant MFCCs for speech/speaker recognition

被引：1

作者：

Tufekci, Zekeriya ^{[1
]}

Disken, Gokay ^{[2
]}

机构：

[1] Cukurova Univ, Fac Engn, Dept Comp Engn, Adana, Turkey

[2] Adana Sci & Technol Univ, Fac Engn, Dept Elect & Elect Engn, Adana, Turkey

来源：

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES | 2019年 / 27卷 / 05期

关键词：

Feature extraction; speaker recognition; speech recognition; SPEECH;

D O I：

10.3906/elk-1901-231

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The feature extraction process is a fundamental part of speech processing. Mel frequency cepstral coefficients (MFCCs) are the most commonly used feature types in the speech/speaker recognition literature. However, the MFCC framework may face numerical issues or dynamic range problems, which decreases their performance. A practical solution to these problems is adding a constant to filter-bank magnitudes before log compression, thus violating the scale-invariant property. In this work, a magnitude normalization and a multiplication constant are introduced to make the MFCCs scale-invariant and to avoid dynamic range expansion of nonspeech frames. Speaker verification experiments are conducted to show the effectiveness of the proposed scheme.

引用

页码：3758 / 3762

页数：5

共 50 条

[1] Speaker recognition via fusion of subglottal features and MFCCs
Arsikere, Harish
Gupta, Hitesh Anand
Alwan, Abeer
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1106 - 1110
[2] Speech Recognition Combining MFCCs and Image Features
Karlos, Stamatis
Fazakis, Nikos
Karanikola, Katerina
Kotsiantis, Sotiris
Sgarbas, Kyriakos
SPEECH AND COMPUTER, 2016, 9811 : 651 - 658
[3] Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition
Lu, Cheng
Zong, Yuan
Zheng, Wenming
Li, Yang
Tang, Chuangao
Schuller, Bjoern W.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2217 - 2230
[4] Contemporary Speech/Speaker Recognition with Speech from Impaired Vocal Apparatus
Nidhyananthan, S. Selva
Selvakumari, R. Shantha
Shenbagalakshmi, V.
2014 INTERNATIONAL CONFERENCE ON COMMUNICATION AND NETWORK TECHNOLOGIES (ICCNT), 2014, : 198 - 202
[5] Speech recognition as feature extraction for speaker recognition
Stolcke, A.
Shriberg, E.
Ferrer, L.
Kajarekar, S.
Sonmez, K.
Tur, G.
2007 IEEE WORKSHOP ON SIGNAL PROCESSING APPLICATIONS FOR PUBLIC SECURITY AND FORENSICS, 2007, : 39 - +
[6] Scale-Invariant Representation of Light Field Images for Object Recognition and Tracking
Ghasemi, Alireza
Vetterli, Andmartin
COMPUTATIONAL IMAGING XII, 2014, 9020
[7] Learnable MFCCs for Speaker Verification
Liu, Xuechen
Sahidullah, Md
Kinnunen, Tomi
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[8] RobinNet: A Multimodal Speech Emotion Recognition System With Speaker Recognition for Social Interactions
Khurana, Yash
Gupta, Swamita
Sathyaraj, R.
Raja, S. P.
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 11 (01) : 478 - 487
[9] Contextual invariant-integration features for improved speaker-independent speech recognition
Mueller, Florian
Mertins, Alfred
SPEECH COMMUNICATION, 2011, 53 (06) : 830 - 841
[10] Search in speech, language identification and speaker recognition in Speech@FIT
Cernocky, Jan
Burget, Lukas
Schwarz, Petr
Matejka, Pavel
Karafiat, Martin
Glembek, Ondrej
Kopecky, Jiri
Szoeke, Igor
Fapso, Michal
Grezl, Frantisek
Hubeika, Valiantsina
Oparin, Ilya
2007 17TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, VOLS 1 AND 2, 2007, : 132 - +

← 1 2 3 4 5 →