Bird Species Classification with Audio-Visual Data using CNN and Multiple Kernel Learning

被引:5
作者
Bold, Naranchimeg [1 ]
Zhang, Chao [2 ]
Akashi, Takuya [1 ]
机构
[1] Iwate Univ, Morioka, Iwate, Japan
[2] Univ Fukui, Fukui, Japan
来源
2019 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW) | 2019年
关键词
bird species classification; multimodal fusion; feature combination; multiple kernel learning; FEATURES;
D O I
10.1109/CW.2019.00022
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, deep convolutional neural networks (CNN) have become a new standard in many machine learning applications not only in image but also in audio processing. However, most of the studies only explore a single type of training data. In this paper, we present a study on classifying bird species by combining deep neural features of both visual and audio data using kernel-based fusion method. Specifically, we extract deep neural features based on the activation values of an inner layer of CNN. We combine these features by multiple kernel learning (MKL) to perform the final classification. In the experiment, we train and evaluate our method on a CUB-200-2011 standard data set combined with our originally collected audio data set with respect to 200 bird species (classes). The experimental results indicate that our CNN+MKL method which utilizes the combination of both categories of data outperforms single-modality methods, some simple kernel combination methods, and the conventional early fusion method.
引用
收藏
页码:85 / 88
页数:4
相关论文
共 14 条
[1]  
[Anonymous], ARXIV180109057
[2]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[3]  
Cakir E, 2017, EUR SIGNAL PR CONF, P1744, DOI 10.23919/EUSIPCO.2017.8081508
[4]  
Gönen M, 2011, J MACH LEARN RES, V12, P2211
[5]  
Jain Ashesh., 2012, KDD, P750
[6]   Caffe: Convolutional Architecture for Fast Feature Embedding [J].
Jia, Yangqing ;
Shelhamer, Evan ;
Donahue, Jeff ;
Karayev, Sergey ;
Long, Jonathan ;
Girshick, Ross ;
Guadarrama, Sergio ;
Darrell, Trevor .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :675-678
[7]  
Kloft M., 2008, PROC INT C WORKSHOP, V4
[8]  
Kloft M, 2011, J MACH LEARN RES, V12, P953
[9]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[10]  
Poria S., 2015, P 2015 C EMP METH NA, P2539, DOI 10.18653/v1/d15-1303