Audio signal feature extraction and classification using local discriminant bases

被引:69
作者
Umapathy, Karthikeyan [1 ]
Krishnan, Sridhar
Rao, Raveendra K.
机构
[1] Univ Western Ontario, Dept Elect & Comp Engn, London, ON N6A 5B8, Canada
[2] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 04期
基金
加拿大自然科学与工程研究理事会;
关键词
audio classification; dissimilarity measures; feature extraction; linear discriminant analysis (LDA); local discriminant bases (LDB); wavelet packets;
D O I
10.1109/TASL.2006.885921
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Audio feature extraction plays an important role in analyzing and characterizing audio content. Auditory scene analysis, content-based retrieval, indexing, and fingerprinting of audio are few of the applications that require efficient feature extraction. The key to extract strong features that characterize the complex nature of audio signals is to identify their discriminatory subspaces. In this paper, we propose an audio feature extraction and a multigroup classification scheme that focuses on identifying discriminatory time-frequency subspaces using the local discriminant bases (LDB) technique. Two dissimilarity measures Were used in the process of selecting the LDB nodes and extracting features from them. The extracted features were then fed to a linear discriminant analysis-based classifier for a three-level hierarchical classification of audio signals into ten classes. In the first level, the audio signals were grouped into artificial and natural sounds. Each of the first level groups were subdivided to form the second level groups viz. instrumental, automobile, human, and nonhuman sounds. The third level was formed by subdividing the four groups of the second level into the final ten groups (drums, flute, piano, aircraft, helicopter, male, female, animals, birds and insects). A database of 213 audio signals were used in this study and an average classification accuracy of 83% for the first level (113 artificial and 100 natural sounds), 92% for the second level (73 instrumental and 40 automobile sounds; 40 human and 60 nonhuman sounds), and 89% for the third level (27 drums, 15 flute, and 31 piano sounds; 23 aircraft and 17 helicopter sounds; 20 male and 20 female speech; 20 animals, 20 birds and 20 insects sounds) were achieved. In addition to the above, a separate classification was also performed combining the LDB features with the mel-frequency cepstral coefficients. The average classification accuracies achieved using the combined features were 91% for the first level, 99% for the second level, and 95% for the third level.
引用
收藏
页码:1236 / 1246
页数:11
相关论文
共 32 条
[1]  
Akbacak M, 2003, INT CONF ACOUST SPEE, P113
[2]  
[Anonymous], P EUR 03 GEN
[3]   Distortion discriminant analysis for audio fingerprinting [J].
Burges, CJC ;
Platt, JC ;
Jana, S .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (03) :165-174
[4]   Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462
[5]  
CHRISTIAN B, 2002, P IEEE SENSORS, V2, P1654
[6]   Summarizing popular music via structural similarity analysis [J].
Cooper, M ;
Foote, J .
2003 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS PROCEEDINGS, 2003, :127-130
[7]  
Dugelay JL, 2002, INT CONF ACOUST SPEE, P4060
[8]   A wavelet-based continuous classification scheme for multifunction myoelectric control [J].
Englehart, K ;
Hudgins, B ;
Parker, PA .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2001, 48 (03) :302-311
[9]   Audio-based context recognition [J].
Eronen, AJ ;
Peltonen, VT ;
Tuomi, JT ;
Klapuri, AP ;
Fagerlund, S ;
Sorsa, T ;
Lorho, G ;
Huopaniemi, J .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :321-329
[10]  
Esmaili S, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS, P665