Audio Classification of Bird Species: a Statistical Manifold Approach

被引:44
作者
Briggs, Forrest [1 ]
Raich, Raviv [1 ]
Fern, Xiaoli Z. [1 ]
机构
[1] Oregon State Univ, Sch EECS, Corvallis, OR 97331 USA
来源
2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING | 2009年
关键词
D O I
10.1109/ICDM.2009.65
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Our goal is to automatically identify which species of bird is present in an audio recording using supervised learning. Devising effective algorithms for bird species classification is a preliminary step toward extracting useful ecological data from recordings collected in the field. We propose a probabilistic model for audio features within a short interval of time, then derive its Bayes risk-minimizing classifier, and show that it is closely approximated by a nearest-neighbor classifier using Kullback-Leibler divergence to compare histograms of features. We note that feature histograms can be viewed as points on a statistical manifold, and KL divergence approximates geodesic distances defined by the Fisher information metric on such manifolds. Motivated by this fact, we propose the use of another approximation to the Fisher information metric, namely the Hellinger metric. The proposed classifiers achieve over 90% accuracy on a data set containing six species of bird, and outperform support vector machines.
引用
收藏
页码:51 / 60
页数:10
相关论文
共 34 条
  • [1] [Anonymous], P SPECOM 2005
  • [2] ARTHUR D, 2007, SODA 2007
  • [3] The importance of spatial autocorrelation, extent and resolution in predicting forest bird occurrence
    Betts, MG
    Diamond, AW
    Forbes, GJ
    Villard, MA
    Gunn, JS
    [J]. ECOLOGICAL MODELLING, 2006, 191 (02) : 197 - 224
  • [4] Catchpole CK., 2003, Bird song: biological themes and variations
  • [5] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [6] Semi-automatic classification of bird vocalizations using spectral peak tracks
    Chen, Zhixin
    Maher, Robert C.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) : 2974 - 2984
  • [7] CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
  • [8] Csurka G., 2004, Workshop on Statistical Learning in Computer Vision, ECCV, V1, P1, DOI DOI 10.1234/12345678
  • [9] DASGUPTA A, 2009, INT STAT REV, V77, P160
  • [10] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
    DAVIS, SB
    MERMELSTEIN, P
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366