Speech and music classification using spectrogram based statistical descriptors and extreme learning machine

被引：18

作者：

Birajdar, Gajanan K. ^{[1
]}

Patil, Mukesh D. ^{[2
]}

机构：

[1] Ramrao Adik Inst Technol, Dept Elect Engn, Navi Mumbai 400706, Maharashtra, India

[2] Ramrao Adik Inst Technol, Dept Elect & Telecommun Engn, Navi Mumbai 400706, Maharashtra, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2019年 / 78卷 / 11期

关键词：

IIR-CQT spectrogram; Nonsubsampled contourlet transform; Generalized Gaussian distribution; Chaos crow search algorithm; ELM classifier; GENERALIZED GAUSSIAN DENSITY; MAXIMUM A-POSTERIORI; CONTOURLET TRANSFORM; IMAGE RETRIEVAL; NEURAL-NETWORKS; TEXT DETECTION; DISCRIMINATION; ALGORITHMS; FEATURES; DESIGN;

D O I：

10.1007/s11042-018-6899-z

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article proposes a novel feature extraction approach for speech/music classification based on generalized Gaussian distribution descriptors extracted from IIR-CQT spectrogram representation. IIR-CQT spectrogram visual representation provides superior temporal resolution at high frequencies and better spectral resolution for low frequencies compared to the conventional short-time Fourier transform analysis which provides uniform frequency resolution. Multi-level decomposition of the spectrogram image is then performed using the Nonsubsampled Contourlet Transform (NSCT) which a fully shift-invariant, multi-scale, and multi-direction expansion that can preserve the edges of the textural pattern of speech and music. The generalized Gaussian distribution (GGD) parameters are produced using maximum likelihood estimation (MLE) from the NSCT subbands to create the image feature descriptor. Chaos crow search algorithm is employed to chose the most relevant feature sub-set and to discard redundant features and finally the extreme learning machine classifier categorizes input audio segment into speech/music. The experimental results show that the proposed feature descriptor is effective and performs better compared to the existing approaches in the speech/music classification. In addition, mismatched training and testing results are also presented.

引用

页码：15141 / 15168

页数：28

共 74 条

[1]

Alam J, 2017, EUR SIGNAL PR CONF, P101, DOI 10.23919/EUSIPCO.2017.8081177

[2] An algorithm for multi-sensor image fusion using maximum a posteriori and nonsubsampled contourlet transform [J].

Anandhi, D. ;

Valli, S. .

COMPUTERS & ELECTRICAL ENGINEERING, 2018, 65 :139-152

[3]

[Anonymous], 2018, IEEE Trans. Multimed.

[4]

[Anonymous], 2000, WILEY SERIES PROBABI

[5] A novel metaheuristic method for solving constrained engineering optimization problems: Crow search algorithm [J].

Askarzadeh, Alireza .

COMPUTERS & STRUCTURES, 2016, 169 :1-12

[6]

Bartlett PL, 1997, ADV NEURAL INFORMATI, P134

[7]

Cancela P., 2009, ISMIR, P309

[8] Handwritten character recognition using wavelet energy and extreme learning machine [J].

Chacko, Binu P. ;

Krishnan, V. R. Vimal ;

Raju, G. ;

Anto, P. Babu .

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2012, 3 (02) :149-161

[9] An evaluation of Convolutional Neural Networks for music classification using spectrograms [J].

Costa, Yandre M. G. ;

Oliveira, Luiz S. ;

Silla, Carlos N., Jr. .

APPLIED SOFT COMPUTING, 2017, 52 :28-38

[10] The nonsubsampled contourlet transform: Theory, design, and applications [J].

da Cunha, Arthur L. ;

Zhou, Jianping ;

Do, Minh N. .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (10) :3089-3101

← 1 2 3 4 5 6 7 8 →