Acoustic auto-encoders for biodiversity assessment

被引：15

作者：

Rowe, Benjamin ^{[1
]}

Eichinski, Philip ^{[1
]}

Zhang, Jinglan ^{[1
]}

Roe, Paul ^{[1
]}

机构：

[1] Queensland Univ Technol, Brisbane, Qld, Australia

来源：

ECOLOGICAL INFORMATICS | 2021年 / 62卷

关键词：

Eco-acoustics; Deep learning; Auto-encoders;

D O I：

10.1016/j.ecoinf.2021.101237

中图分类号：

Q14 [生态学（生物生态学）];

学科分类号：

071012 ; 0713 ;

摘要：

Continuous audio recordings are playing an ever more important role in conservation and biodiversity monitoring, however, listening to these recordings is often infeasible, as they can be thousands of hours long. Automating analysis using machine learning algorithms requires a feature representation. In this paper we propose a technique for learning a general feature representation from unlabelled audio using auto-encoders, which can be used for analysing environmental audio on a small timescale. We start by segmenting the audio data into non-overlapping 1-s long chunks and generating audio spectrograms. These audio spectrograms are then used to train a basic auto-encoder, with the output of the encoder network being used to generate the feature representation. We have found that at a 1-s timescale, our feature representation offers marginal improvements over ?acoustic indices?, a common representation for analysing environmental audio.

引用

页数：12

共 43 条

[1]

Abesser J, 2017, 2017 AES INTERNATIONAL CONFERENCE ON SEMANTIC AUDIO

[2]

[Anonymous], 1998, 1 INT C LANGUAGE RES

[3]

[Anonymous], 2017, SIGNAL

[4]

[Anonymous], 2011, P 28 INT C INT C MAC, DOI DOI 10.5555/3104482.3104587

[5]

[Anonymous], 2016, ARXIV160908408

[6]

[Anonymous], 2015, ARXIV150500853

[7] NEURAL NETWORKS AND PRINCIPAL COMPONENT ANALYSIS - LEARNING FROM EXAMPLES WITHOUT LOCAL MINIMA [J].

BALDI, P ;

HORNIK, K .

NEURAL NETWORKS, 1989, 2 (01) :53-58

[8] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[9]

Dema T., 2019, P 2019 CHI C HUM FAC, P664

[10]

Deng L, 2013, MICROSOFT RES MONOGR

← 1 2 3 4 5 →