Classification of audio signals using SVM and RBFNN

被引:104
作者
Dhanalakshmi, P. [1 ]
Palanivel, S. [1 ]
Ramalingam, V. [1 ]
机构
[1] Annamalai Univ, Dept Comp Sci & Engn, Chidambaram 608002, Tamil Nadu, India
关键词
Support vector machines; Radial basis function neural network; Linear predictive coefficients; Linear predictive cepstral coefficients; Mel-frequency cepstral coefficients; SEGMENTATION;
D O I
10.1016/j.eswa.2008.06.126
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the age of digital information, audio data has become an important part in many modern computer applications. Audio classification has been becoming a focus in the research of audio processing and pattern recognition. Automatic audio classification is very useful to audio indexing, content-based audio retrieval and on-line audio distribution, but it is a challenge to extract the most common and salient themes from unstructured raw audio data. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. Support vector machines are applied to classify audio into their respective classes by learning from training data. Then the proposed method extends the application of neural network (RBFNN) for the classification of audio. RBFNN enables nonlinear transformation followed by linear transformation to achieve a higher dimension in the hidden space. The experiments on different genres of the various categories illustrate the results of classification are significant and effective. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:6069 / 6075
页数:7
相关论文
共 25 条
[1]   Security monitoring using microphone arrays and audio classification [J].
Abu-El-Quran, Ahmad R. ;
Goubran, Rafik A. ;
Chan, Adrian D. C. .
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2006, 55 (04) :1025-1032
[2]   Speech/music segmentation using entropy and dynamism features in a HMM classification framework [J].
Ajmera, J ;
McCowan, I ;
Bourlard, H .
SPEECH COMMUNICATION, 2003, 40 (03) :351-363
[3]  
[Anonymous], IEEE T NEURAL NETWOR
[4]  
[Anonymous], 2001, NEURAL NETWORKS COMP
[5]  
Duda R. O., 2000, Pattern classification
[6]   Audio-based context recognition [J].
Eronen, AJ ;
Peltonen, VT ;
Tuomi, JT ;
Klapuri, AP ;
Fagerlund, S ;
Sorsa, T ;
Lorho, G ;
Huopaniemi, J .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :321-329
[7]  
Esmaili S, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS, P665
[8]   Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora [J].
Huang, RQ ;
Hansen, JHL .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03) :907-919
[9]  
Jiang HC, 2005, PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), P131
[10]   A generic audio classification and segmentation approach for multimedia indexing and retrieval [J].
Kiranyaz, S ;
Qureshi, AF ;
Gabbouj, M .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03) :1062-1081