Per-Channel Energy Normalization: Why and How

被引：58

作者：

Lostanlen, Vincent ^{[1
]}

Salamon, Justin ^{[2
]}

Cartwright, Mark ^{[2
]}

McFee, Brian ^{[2
]}

Farnsworth, Andrew ^{[1
]}

Kelling, Steve ^{[1
]}

Bello, Juan Pablo ^{[2
]}

机构：

[1] Cornell Univ, Cornell Lab Ornithol, Ithaca, NY 14850 USA

[2] NYU, Brooklyn, NY 11201 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2019年 / 26卷 / 01期

基金：

美国国家科学基金会;

关键词：

Acoustic noise; acoustic sensors; acoustic signal detection; signal classification; spectrogram; CONVOLUTIONAL NEURAL-NETWORKS;

D O I：

10.1109/LSP.2018.2878620

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmel-spec) as an acoustic frontend. This letter investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints. First, we apply PCEN on various datasets of natural acoustic environments and find empirically that it Gaussianizes distributions of magnitudes while decorrelating frequency bands. Second, we describe the asymptotic regimes of each component in PCEN: temporal integration, gain control, and dynamic range compression. Third, we give practical advice for adapting PCEN parameters to the temporal properties of the noise to be mitigated, the signal to be enhanced, and the choice of time-frequency representation. As it converts a large class of real-world soundscapes into additive white Gaussian noise, PCEN is a computationally efficient frontend for robust detection and classification of acoustic events in heterogeneous environments.

引用

页码：39 / 43

页数：5

共 32 条

[1] Convolutional Neural Networks for Speech Recognition [J].

Abdel-Hamid, Ossama ;

Mohamed, Abdel-Rahman ;

Jiang, Hui ;

Deng, Li ;

Penn, Gerald ;

Yu, Dong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545

[2] Deep Scattering Spectrum [J].

Anden, Joakim ;

Mallat, Stephane .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (16) :4114-4128

[3]

Andersén J, 2015, PROC INT CONF INTELL, P1

[4]

[Anonymous], ARXIV180310916

[5] EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].

ATAL, BS .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) :1304-1312

[6]

Badeau R., 2016, 2016D001 I MIN TEL

[7]

Battenberg E., ARXIV170504400

[8]

Bello J. P., 2018, COMMUN ASS COMPUT MA

[9] AN ANALYSIS OF TRANSFORMATIONS [J].

BOX, GEP ;

COX, DR .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1964, 26 (02) :211-252

[10] MVA processing of speech features [J].

Chen, Chia-Ping ;

Bilmes, Jeff A. .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01) :257-270

← 1 2 3 4 →