DEEP NEURAL NETWORK DERIVED BOTTLENECK FEATURES FOR ACCURATE AUDIO CLASSIFICATION

被引:0
作者
Zhang, Bihong [1 ]
Xie, Lei [1 ,2 ]
Yuan, Yougen [2 ]
Ming, Huaiping [3 ]
Huang, Dongyan [3 ]
Song, Mingli [4 ]
机构
[1] Northwestern Polytech Univ, Sch Software & Microelect, Xian, Peoples R China
[2] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[3] ASTAR, Inst Infocomm Res, Singapore, Singapore
[4] Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China
来源
2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW) | 2016年
关键词
deep neural networks; audio classification; bottleneck features;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose to use deep neural network (DNN) as an effective tool for audio feature extraction. The DNN-derived features can be effectively used in a subsequent classifier (e.g., an SVM in this study) for audio classification. Specifically, we learn bottleneck features from a multi-layer perceptron (MLP), in which Mel filter bank feature is used as network input and one of the hidden layers has a small number of hidden units, compared to the size of the other hidden layers. The narrow hidden layer is served as a bottleneck layer, which creates a constriction in the network that forces the information pertinent to classification into a compact feature representation. We study both unsupervised and supervised bottleneck feature extraction methods and demonstrate that the supervised bottleneck features outperform conventional hand-crafted features and achieve the state-of-the-art performance in audio classification.
引用
收藏
页数:6
相关论文
共 32 条
[1]  
Alexandre Enrique., 2007, Intelligent Signal Processing, P1, DOI [10.1109/wisp.2007.4447622, DOI 10.1109/WISP.2007.4447622]
[2]   Short-term memory for serial order: A recurrent neural network model [J].
Botvinick, MM ;
Plaut, DC .
PSYCHOLOGICAL REVIEW, 2006, 113 (02) :201-233
[3]   A comparison of features for speech, music discrimination. [J].
Carey, MJ ;
Parris, ES ;
Lloyd-Thomas, H .
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :149-152
[4]   Mixed type audio classification with Support Vector Machine [J].
Chen, Lei ;
Gunduz, Sule ;
Ozsu, M. Tamer .
2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, :781-+
[5]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[6]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[7]   Deep Learning: Methods and Applications [J].
Deng, Li ;
Yu, Dong .
FOUNDATIONS AND TRENDS IN SIGNAL PROCESSING, 2013, 7 (3-4) :I-387
[8]  
Deng L, 2013, INT CONF ACOUST SPEE, P8604, DOI 10.1109/ICASSP.2013.6639345
[9]  
Fu ZH, 2009, IEEE INT CON MULTI, P574, DOI 10.1109/ICME.2009.5202561
[10]   Robust speech recognition in noisy environments based on subband spectral centroid histograms [J].
Gajic, B ;
Paliwal, KK .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02) :600-608