Time-Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals

被引：89

作者：

Ghoraani, Behnaz ^{[1
]}

Krishnan, Sridhar ^{[1
]}

机构：

[1] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON L4C 9R5, Canada

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 07期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Environmental audio classification; matching pursuit time-frequency distribution; non-negative matrix factorization (NMF); time-frequency quantification; time-frequency matrix feature extraction; ALGORITHMS; TRANSFORM;

D O I：

10.1109/TASL.2011.2118753

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Audio feature extraction and classification are important tools for audio signal analysis in many applications, such as multimedia indexing and retrieval, and auditory scene analysis. However, due to the nonstationarities and discontinuities exist in these signals, their quantification and classification remains a formidable challenge. In this paper, we develop a new approach for audio feature extraction to effectively quantify these nonstationarities in an attempt to achieve high classification accuracy for environmental audio signals. Our approach consists of three stages: first we propose to construct the time-frequency matrix (TFM) of audio signals using matching-pursuit time-frequency distribution (MP-TFD) technique, and then apply the non-negative matrix decomposition (NMF) technique to decompose the TFM into its significant components. Finally, we propose seven novel features from the spectral and temporal structures of the decomposed vectors in a way that they successfully represent joint TF structure of the audio signal, and combine them with the Mel-frequency cepstral coefficients (MFCCs) features. These features are examined using a database of 192 environmental audio signals which includes 20 aircraft, 17 helicopter, 20 drum, 15 flute, 20 piano, 20 animal, 20 bird, and 20 insect sounds, and the speech of 20 males and 20 females. The results of the numerical simulation support the effectiveness of the proposed approach for environmental audio classification with over 10% accuracy-rate improvement compared to the MFCC features.

引用

页码：2197 / 2209

页数：13

共 34 条

[1]

Benetos E., 2006, P IEEE INT C AC SPEE, V5, P14

[2] Algorithms and applications for approximate nonnegative matrix factorization [J].

Berry, Michael W. ;

Browne, Murray ;

Langville, Amy N. ;

Pauca, V. Paul ;

Plemmons, Robert J. .

COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :155-173

[3] SVD based initialization: A head start for nonnegative matrix factorization [J].

Boutsidis, C. ;

Gallopoulos, E. .

PATTERN RECOGNITION, 2008, 41 (04) :1350-1362

[4] A comparison of features for speech, music discrimination. [J].

Carey, MJ ;

Parris, ES ;

Lloyd-Thomas, H .

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :149-152

[5] IMPROVED TIME-FREQUENCY REPRESENTATION OF MULTICOMPONENT SIGNALS USING EXPONENTIAL KERNELS [J].

CHOI, HI ;

WILLIAMS, WJ .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (06) :862-871

[6] Environmental sound recognition using MP-BASED features [J].

Chu, Selina ;

Narayanan, Shrikanth ;

Kuo, C. -C. Jay .

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :1-+

[7] TIME FREQUENCY-DISTRIBUTIONS - A REVIEW [J].

COHEN, L .

PROCEEDINGS OF THE IEEE, 1989, 77 (07) :941-981

[8] THE WAVELET TRANSFORM, TIME-FREQUENCY LOCALIZATION AND SIGNAL ANALYSIS [J].

DAUBECHIES, I .

IEEE TRANSACTIONS ON INFORMATION THEORY, 1990, 36 (05) :961-1005

[9]

Deshpande H., 2001, P COSTG6 C DIG AUD E

[10]

Donoho David., 2003, DOES NONNEGATIVE MAT

← 1 2 3 4 →