Dimensionality Reduction of Modulation Frequency Features for Speech Discrimination

被引:0
作者
Markaki, Maria [1 ]
Stylianou, Yannis [1 ]
机构
[1] Univ Crete, Dept Comp Sci, Khania, Greece
来源
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | 2008年
关键词
modulation spectrum; multilinear algebra; feature selection; mutual information; speech discrimination;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe a dimensionality reduction method for modulation spectral features, which keeps the time-varying information of interest to the classification task. Due to the varying degrees of redundancy and discriminative power of the acoustic and modulation frequency subspaces, we first employ a generalization of SVD to tensors (Higher Order SVD) to reduce dimensions. Projection of modulation spectral features on the principal axes with the higher energy in each subspace results in a compact feature set. We further estimate the relevance of these projections to speech discrimination based on mutual information to the target class. Reconstruction of modulation spectrograms from the "best" 22 features back to the initial dimensions, shows that modulation spectral features close to syllable and phoneme rates as well as pitch values of speakers are preserved.
引用
收藏
页码:646 / 649
页数:4
相关论文
共 11 条
[1]  
[Anonymous], 2005, Estimating mutual information and multi-information in large networks
[2]  
[Anonymous], MODULATION TOOLBOX
[3]   Joint acoustic and modulation frequency [J].
Atlas, L ;
Shamma, SA .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (07) :668-675
[4]   Dimensionality reduction in higher-order signal processing and rank-(R1, R2, ..., RN) reduction in multilinear algebra [J].
De Lathauwer, L ;
Vandewalle, J .
LINEAR ALGEBRA AND ITS APPLICATIONS, 2004, 391 :31-55
[5]   PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].
HERMANSKY, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752
[6]  
Joachims J., 1999, ADV KERNEL METHODS S
[7]   Content-based audio classification and segmentation by using support vector machines [J].
Lu, L ;
Zhang, HJ ;
Li, SZ .
MULTIMEDIA SYSTEMS, 2003, 8 (06) :482-491
[8]  
MARKAKI M, 2008, P ISCA SPEE IN PRESS
[9]   Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations [J].
Mesgarani, N ;
Slaney, M ;
Shamma, SA .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03) :920-930
[10]   Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy [J].
Peng, HC ;
Long, FH ;
Ding, C .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (08) :1226-1238