Real-Time monophonic and polyphonic audio classification from power spectra

被引:7
作者
Baelde, Maxime [1 ,2 ]
Biernacki, Christophe [2 ]
Greff, Raphael [1 ]
机构
[1] A Volute, 19 Rue Ladrie, F-59491 Villeneuve Dascq, France
[2] Univ Lille, INRIA, Modal team, CNRS,UMR 8524,Lab Paul Painleve, F-59000 Lille, France
关键词
Real-time; Audio classification; Machine learning; Monophonic; Polyphonic; Generative model; Nonparametric estimation; MODEL;
D O I
10.1016/j.patcog.2019.03.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and hazardous traditional feature extraction. It is also a natural candidate for polyphonic events thanks to its additive property in such cases. The classification task is performed through a nonparametric kernel-based generative modeling of the power spectrum. Advantage of this model is twofold: it is almost hypothesis free and it allows to straightforwardly obtain the maximum a posteriori classification rule of online signals. Moreover it makes use of the monophonic dataset to build the polyphonic one. Then, to reach the real-time target, the complexity of the method can be tuned by using a standard hierarchical clustering preprocessing of the prototypes, revealing a particularly efficient computation time and classification accuracy trade-off. The proposed method, called RARE (for Real-time Audio Recognition Engine) reveals encouraging results both in monophonic and polyphonic classification tasks on benchmark and owned datasets, including also the targeted real-time situation. In particular, this method benefits from several advantages compared to the state-of-the-art methods including a reduced training time, no feature extraction, the ability to control the computation - accuracy trade-off and no training on already mixed sounds for polyphonic classification. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:82 / 92
页数:11
相关论文
共 43 条
[1]  
Alazaidah R., INT J ADV COMPUT SCI, V6
[2]  
[Anonymous], 24 EUR SIGN PROC C
[3]  
[Anonymous], 2017, DCASE 2017
[4]  
Baelde M., 2017, 49 JOURNEES STAT
[5]  
Baelde M, 2017, INT CONF ACOUST SPEE, P2427, DOI 10.1109/ICASSP.2017.7952592
[6]   Scalable identification of mixed environmental sounds, recorded from heterogeneous sources [J].
Beltran, Jessica ;
Chavez, Edgar ;
Favela, Jesus .
PATTERN RECOGNITION LETTERS, 2015, 68 :153-160
[7]  
Biernacki C., 1998, TECHNICAL REPORT
[8]  
Bietti A, 2015, INT CONF ACOUST SPEE, P1881, DOI 10.1109/ICASSP.2015.7178297
[9]  
Biondi Robin, 2014, INT J COMPUTER ELECT, V8, P1653
[10]  
Bisot V, 2017, INT CONF ACOUST SPEE, P31, DOI 10.1109/ICASSP.2017.7951792