EFFECTIVENESS OF MULTISCALE FRACTAL DIMENSION FOR IMPROVEMENT OF FRAME CLASSIFICATION RATE

被引:0
作者
Zaki, Mohammadi [1 ]
Shah, Nirmesh J. [1 ]
Patil, Hemant A. [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol D, Gandhinagar 382007, India
来源
2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2015年
关键词
fractal dimension; multiscale analysis; phoneme-based frame classification; nonlinearity;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose to use multiscale fractal dimension (FD)-based features for phoneme classification task at frame-level. During speech production, turbulence is created and hence vortices (generated due to presence of separated airflow) may travel along the vocal tract and excite vocal tract resonators. This turbulence and in effect, the embedded features of different phoneme classes, can be captured by invariant property of multiscale FD. To capture complementary information, feature-level fusion of proposed feature with state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) is attempted and found to be effective. In particular, single-hidden layer neural nets were trained to compute the frame classification rate. Proposed feature was able to reduce the error rate by over 1.6 % from MFCC features on TLliIT database. This is supported by significant reduction in % [ER (i.e., 0.327 % to 4.795 %)(1).
引用
收藏
页码:1018 / 1022
页数:5
相关论文
共 28 条
[1]  
[Anonymous], 1983, FRACTAL GEOMETRY NAT
[2]  
Baljekar PN, 2012, INT CONF ACOUST SPEE, P4461, DOI 10.1109/ICASSP.2012.6288910
[3]   Enhancing the Feature Extraction Process for Automatic Speech Recognition with Fractal Dimensions [J].
Ezeiza, Aitzol ;
Lopez de Ipina, Karmele ;
Hernandez, Carmen ;
Barroso, Nora .
COGNITIVE COMPUTATION, 2013, 5 (04) :545-550
[4]  
Frisch U., 1985, Fully Developed Turbulence and Intermittency in Turbulence and Predictability in Geophysical Fluid Dynamics and Climate Dynamics
[5]  
Frisch U, 1999, TURBULENCE LEGACY AN
[6]  
Garofolo J., 1988, Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database
[7]  
Gonzalez D. C., 2012, PROGR PATTERN RECOGN, V7441, P740, DOI DOI 10.1007/978-3-642-33275-3
[8]  
Hadjileontiadis LJ, 2007, IEEE ENG MED BIOL, V26, P30, DOI 10.1109/MEMB.2007.289119
[9]   Characterization of Healthy and Pathological Voice Through Measures Based on Nonlinear Dynamics [J].
Henriquez, Patricia ;
Alonso, Jesus B. ;
Ferrer, Miguel A. ;
Travieso, Carlos M. ;
Godino-Llorente, Juan I. ;
Diaz-de-Maria, Fernando .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06) :1186-1195
[10]  
Jiménez J, 2004, ARBOR, V178, P589