Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification

被引:190
作者
Valero, Xavier [1 ]
Alias, Francesc [1 ]
机构
[1] La Salle Univ Ramon Llull, GTM Grp Recerca Tecnol Media, Barcelona 08022, Spain
关键词
Audio classification; audio scene recognition; environmental sound; feature extraction; Gammatone cepstral coefficients; ENVIRONMENTAL SOUND RECOGNITION; FREQUENCY;
D O I
10.1109/TMM.2012.2199972
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of non-speech audio recognition and classification for multimedia applications, it becomes essential to have a set of features able to accurately represent and discriminate among audio signals. Mel frequency cepstral coefficients (MFCC) have become a de facto standard for audio parameterization. Taking as a basis the MFCC computation scheme, the Gammatone cepstral coefficients (GTCCs) are a biologically inspired modification employing Gammatone filters with equivalent rectangular bandwidth bands. In this letter, the GTCCs, which have been previously employed in the field of speech research, are adapted for non-speech audio classification purposes. Their performance is evaluated on two audio corpora of 4 h each (general sounds and audio scenes), following two cross-validation schemes and four machine learning methods. According to the results, classification accuracies are significantly higher when employing GTCC rather than other state-of-the-art audio features. As a detailed analysis shows, with a similar computational cost, the GTCC are more effective than MFCC in representing the spectral characteristics of non-speech audio signals, especially at low frequencies.
引用
收藏
页码:1684 / 1689
页数:6
相关论文
共 26 条
  • [1] Abdulla W.H., 2002, Advances in Communications and Software Technologies, P231
  • [2] [Anonymous], 2007, P ICASSP
  • [3] [Anonymous], 2341 APU
  • [4] Subband architecture for automatic speaker recognition
    Besacier, L
    Bonastre, JF
    [J]. SIGNAL PROCESSING, 2000, 80 (07) : 1245 - 1259
  • [5] Co-clustering for auditory scene categorization
    Cai, Rui
    Lu, Lie
    Hanjalic, Alan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (04) : 596 - 606
  • [6] TEMPORAL CODING OF RESONANCES BY LOW-FREQUENCY AUDITORY-NERVE FIBERS - SINGLE-FIBER RESPONSES AND A POPULATION-MODEL
    CARNEY, LH
    YIN, TCT
    [J]. JOURNAL OF NEUROPHYSIOLOGY, 1988, 60 (05) : 1653 - 1677
  • [7] Cheng O., 2005, P ISSPA
  • [8] Environmental Sound Recognition With Time-Frequency Audio Features
    Chu, Selina
    Narayanan, Shrikanth
    Kuo, C. -C. Jay
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1142 - 1158
  • [9] Comparison of techniques for environmental sound recognition
    Cowling, M
    Sitte, R
    [J]. PATTERN RECOGNITION LETTERS, 2003, 24 (15) : 2895 - 2907
  • [10] Audio-visual event recognition in surveillance video sequences
    Cristani, Marco
    Bicego, Manuele
    Murino, Vittorio
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 257 - 267