A clustering based feature selection method in spectro-temporal domain for speech recognition

被引:14
|
作者
Esfandian, Nafiseh [1 ]
Razzazi, Farbod [1 ]
Behrad, Alireza [2 ]
机构
[1] Islamic Azad Univ, Sci & Res Branch, Dept Elect & Comp Engn, Tehran, Iran
[2] Shahed Univ, Fac Engn, Tehran, Iran
关键词
Speech recognition; Spectro-temporal model; Feature extraction; Clustering; Gaussian mixture models; Weighted K-means; WEIGHTED K-MEANS; REPRESENTATIONS;
D O I
10.1016/j.engappai.2012.04.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1194 / 1202
页数:9
相关论文
共 50 条
  • [41] SPECTRO-TEMPORAL SUBBAND WIENER FILTER FOR SPEECH ENHANCEMENT
    Hsu, Chung-Chien
    Lin, Tse-En
    Chen, Jian-Hueng
    Chi, Tai-Shih
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4001 - 4004
  • [42] Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
    Geng, Mengzhe
    Xie, Xurong
    Ye, Zi
    Wang, Tianzi
    Li, Guinan
    Hu, Shujie
    Liu, Xunying
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2597 - 2611
  • [43] AUTOMATIC RECOGNITION OF SPEECH EMOTION USING LONG-TERM SPECTRO-TEMPORAL FEATURES
    Wu, Siqing
    Falk, Tiago H.
    Chan, Wai-Yip
    2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 205 - 210
  • [44] Spectro-temporal modulation glimpsing for speech intelligibility prediction
    Edraki, Amin
    Chan, Wai-Yip
    Jensen, Jesper
    Fogerty, Daniel
    HEARING RESEARCH, 2022, 426
  • [45] Spectro-temporal modulation transfer functions and speech intelligibility
    Chi, TS
    Gao, YJ
    Guyton, MC
    Ru, PW
    Shamma, S
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (05): : 2719 - 2732
  • [46] Spectro-temporal weighting of interaural time differences in speech
    Baltzell, Lucas S.
    Cho, Adrian Y.
    Swaminathan, Jayaganesh
    Best, Virginia
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 147 (06): : 3883 - 3894
  • [47] A Phoneme Recognition Framework based on Auditory Spectro-Temporal Receptive Fields
    Thomas, Samuel
    Patil, Kailash
    Ganapathy, Sriram
    Mesgarani, Nima
    Hermansky, Hynek
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2458 - 2461
  • [48] Combining feature space discriminative training with long-term spectro-temporal features for noise-robust speech recognition
    Fukuda, Takashi
    Ichikawa, Osamu
    Nishimura, Masafumi
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 236 - 239
  • [49] Spectro-temporal processing in the envelope-frequency domain
    Ewert, SD
    Verhey, JL
    Dau, T
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 112 (06): : 2921 - 2931
  • [50] Temporal feature selection for noisy speech recognition
    Department of Computer Science and Software Engineering, Université Laval, Quebec
    QC
    G1V 0A6, Canada
    Lect. Notes Comput. Sci., (155-166):