Real-Time monophonic and polyphonic audio classification from power spectra

被引：7

作者：

Baelde, Maxime ^{[1
,2
]}

Biernacki, Christophe ^{[2
]}

Greff, Raphael ^{[1
]}

机构：

[1] A Volute, 19 Rue Ladrie, F-59491 Villeneuve Dascq, France

[2] Univ Lille, INRIA, Modal team, CNRS,UMR 8524,Lab Paul Painleve, F-59000 Lille, France

来源：

PATTERN RECOGNITION | 2019年 / 92卷

关键词：

Real-time; Audio classification; Machine learning; Monophonic; Polyphonic; Generative model; Nonparametric estimation; MODEL;

D O I：

10.1016/j.patcog.2019.03.017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and hazardous traditional feature extraction. It is also a natural candidate for polyphonic events thanks to its additive property in such cases. The classification task is performed through a nonparametric kernel-based generative modeling of the power spectrum. Advantage of this model is twofold: it is almost hypothesis free and it allows to straightforwardly obtain the maximum a posteriori classification rule of online signals. Moreover it makes use of the monophonic dataset to build the polyphonic one. Then, to reach the real-time target, the complexity of the method can be tuned by using a standard hierarchical clustering preprocessing of the prototypes, revealing a particularly efficient computation time and classification accuracy trade-off. The proposed method, called RARE (for Real-time Audio Recognition Engine) reveals encouraging results both in monophonic and polyphonic classification tasks on benchmark and owned datasets, including also the targeted real-time situation. In particular, this method benefits from several advantages compared to the state-of-the-art methods including a reduced training time, no feature extraction, the ability to control the computation - accuracy trade-off and no training on already mixed sounds for polyphonic classification. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：82 / 92

页数：11

共 43 条

[1]

Alazaidah R., INT J ADV COMPUT SCI, V6

[2]

[Anonymous], 24 EUR SIGN PROC C

[3]

[Anonymous], 2017, DCASE 2017

[4]

Baelde M., 2017, 49 JOURNEES STAT

[5]

Baelde M, 2017, INT CONF ACOUST SPEE, P2427, DOI 10.1109/ICASSP.2017.7952592

[6] Scalable identification of mixed environmental sounds, recorded from heterogeneous sources [J].

Beltran, Jessica ;

Chavez, Edgar ;

Favela, Jesus .

PATTERN RECOGNITION LETTERS, 2015, 68 :153-160

[7]

Biernacki C., 1998, TECHNICAL REPORT

[8]

Bietti A, 2015, INT CONF ACOUST SPEE, P1881, DOI 10.1109/ICASSP.2015.7178297

[9]

Biondi Robin, 2014, INT J COMPUTER ELECT, V8, P1653

[10]

Bisot V, 2017, INT CONF ACOUST SPEE, P31, DOI 10.1109/ICASSP.2017.7951792

← 1 2 3 4 5 →