Bird Species Classification with Audio-Visual Data using CNN and Multiple Kernel Learning

被引：5

作者：

Bold, Naranchimeg ^{[1
]}

Zhang, Chao ^{[2
]}

Akashi, Takuya ^{[1
]}

机构：

[1] Iwate Univ, Morioka, Iwate, Japan

[2] Univ Fukui, Fukui, Japan

来源：

2019 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW) | 2019年

关键词：

bird species classification; multimodal fusion; feature combination; multiple kernel learning; FEATURES;

D O I：

10.1109/CW.2019.00022

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Recently, deep convolutional neural networks (CNN) have become a new standard in many machine learning applications not only in image but also in audio processing. However, most of the studies only explore a single type of training data. In this paper, we present a study on classifying bird species by combining deep neural features of both visual and audio data using kernel-based fusion method. Specifically, we extract deep neural features based on the activation values of an inner layer of CNN. We combine these features by multiple kernel learning (MKL) to perform the final classification. In the experiment, we train and evaluate our method on a CUB-200-2011 standard data set combined with our originally collected audio data set with respect to 200 bird species (classes). The experimental results indicate that our CNN+MKL method which utilizes the combination of both categories of data outperforms single-modality methods, some simple kernel combination methods, and the conventional early fusion method.

引用

页码：85 / 88

页数：4

共 14 条

[1]

[Anonymous], ARXIV180109057

[2] Multimodal Machine Learning: A Survey and Taxonomy [J].

Baltrusaitis, Tadas ;

Ahuja, Chaitanya ;

Morency, Louis-Philippe .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443

[3]

Cakir E, 2017, EUR SIGNAL PR CONF, P1744, DOI 10.23919/EUSIPCO.2017.8081508

[4]

Gönen M, 2011, J MACH LEARN RES, V12, P2211

[5]

Jain Ashesh., 2012, KDD, P750

[6] Caffe: Convolutional Architecture for Fast Feature Embedding [J].

Jia, Yangqing ;

Shelhamer, Evan ;

Donahue, Jeff ;

Karayev, Sergey ;

Long, Jonathan ;

Girshick, Ross ;

Guadarrama, Sergio ;

Darrell, Trevor .

PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :675-678

[7]

Kloft M., 2008, PROC INT C WORKSHOP, V4

[8]

Kloft M, 2011, J MACH LEARN RES, V12, P953

[9] ImageNet Classification with Deep Convolutional Neural Networks [J].

Krizhevsky, Alex ;

Sutskever, Ilya ;

Hinton, Geoffrey E. .

COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90

[10]

Poria S., 2015, P 2015 C EMP METH NA, P2539, DOI 10.18653/v1/d15-1303

← 1 2 →