DEEP NEURAL NETWORK DERIVED BOTTLENECK FEATURES FOR ACCURATE AUDIO CLASSIFICATION

被引：0

作者：

Zhang, Bihong ^{[1
]}

Xie, Lei ^{[1
,2
]}

Yuan, Yougen ^{[2
]}

Ming, Huaiping ^{[3
]}

Huang, Dongyan ^{[3
]}

Song, Mingli ^{[4
]}

机构：

[1] Northwestern Polytech Univ, Sch Software & Microelect, Xian, Peoples R China

[2] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China

[3] ASTAR, Inst Infocomm Res, Singapore, Singapore

[4] Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW) | 2016年

关键词：

deep neural networks; audio classification; bottleneck features;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose to use deep neural network (DNN) as an effective tool for audio feature extraction. The DNN-derived features can be effectively used in a subsequent classifier (e.g., an SVM in this study) for audio classification. Specifically, we learn bottleneck features from a multi-layer perceptron (MLP), in which Mel filter bank feature is used as network input and one of the hidden layers has a small number of hidden units, compared to the size of the other hidden layers. The narrow hidden layer is served as a bottleneck layer, which creates a constriction in the network that forces the information pertinent to classification into a compact feature representation. We study both unsupervised and supervised bottleneck feature extraction methods and demonstrate that the supervised bottleneck features outperform conventional hand-crafted features and achieve the state-of-the-art performance in audio classification.

引用

页数：6

共 32 条

[1]

Alexandre Enrique., 2007, Intelligent Signal Processing, P1, DOI [10.1109/wisp.2007.4447622, DOI 10.1109/WISP.2007.4447622]

[2] Short-term memory for serial order: A recurrent neural network model [J].

Botvinick, MM ;

Plaut, DC .

PSYCHOLOGICAL REVIEW, 2006, 113 (02) :201-233

[3] A comparison of features for speech, music discrimination. [J].

Carey, MJ ;

Parris, ES ;

Lloyd-Thomas, H .

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :149-152

[4] Mixed type audio classification with Support Vector Machine [J].

Chen, Lei ;

Gunduz, Sule ;

Ozsu, M. Tamer .

2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, :781-+

[5]

CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411

[6] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].

Dahl, George E. ;

Yu, Dong ;

Deng, Li ;

Acero, Alex .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42

[7] Deep Learning: Methods and Applications [J].

Deng, Li ;

Yu, Dong .

FOUNDATIONS AND TRENDS IN SIGNAL PROCESSING, 2013, 7 (3-4) :I-387

[8]

Deng L, 2013, INT CONF ACOUST SPEE, P8604, DOI 10.1109/ICASSP.2013.6639345

[9]

Fu ZH, 2009, IEEE INT CON MULTI, P574, DOI 10.1109/ICME.2009.5202561

[10] Robust speech recognition in noisy environments based on subband spectral centroid histograms [J].

Gajic, B ;

Paliwal, KK .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02) :600-608

← 1 2 3 4 →