FEDERATED SELF-SUPERVISED LEARNING FOR ACOUSTIC EVENT CLASSIFICATION

被引：6

作者：

Feng, Meng ^{[1
]}

Kao, Chieh-Chi ^{[2
]}

Tang, Qingming ^{[2
]}

Sun, Ming ^{[2
]}

Rozgic, Viktor ^{[2
]}

Matsoukas, Spyros ^{[2
]}

Wang, Chao ^{[2
]}

机构：

[1] MIT, Cambridge, MA 02139 USA

[2] Amazon Com Inc, Seattle, WA USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Federated learning; representation learning; self-supervised learning; acoustic event classification;

D O I：

10.1109/ICASSP43922.2022.9747472

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Standard acoustic event classification (AEC) solutions require large-scale collection of data from client devices for model optimization. Federated learning (FL) is a compelling framework that decouples data collection and model training to enhance customer privacy. In this work, we investigate the feasibility of applying FL to improve AEC performance while no customer data can be directly uploaded to the server. We assume no pseudo labels can be inferred from on-device user inputs, aligning with the typical use cases of AEC. We adapt self-supervised learning to the FL framework for on-device continual learning of representations, and it results in improved performance of the downstream AEC classifiers without labeled/pseudo-labeled data available. Compared to the baseline w/o FL, the proposed method improves precision up to 20.3% relatively while maintaining the recall. Our work differs from prior work in FL in that our approach does not require user-generated learning targets, and the data we use is collected from our Beta program and is de-identified, to maximally simulate the production settings.

引用

页码：481 / 485

页数：5

共 27 条

[1] Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection [J].

Cakir, Emre ;

Parascandolo, Giambattista ;

Heittola, Toni ;

Huttunen, Heikki ;

Virtanen, Tuomas .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) :1291-1303

[2]

Cano P., 2005, 13th Annual ACM International Conference on Multimedia, P211, DOI 10.1145/1101149.1101181

[3]

Cartwright Mark, 2019, P DET CLASS AC SCEN, P35, DOI DOI 10.33682/J5ZW-2T88

[4]

Chung YA, 2020, Arxiv, DOI arXiv:2004.05274

[5]

Chung YA, 2019, Arxiv, DOI arXiv:1904.03240

[6]

Chung YA, 2020, INT CONF ACOUST SPEE, P3497, DOI [10.1109/icassp40776.2020.9054438, 10.1109/ICASSP40776.2020.9054438]

[7] FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION [J].

Cui, Xiaodong ;

Lu, Songtao ;

Kingsbury, Brian .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :6748-6752

[8]

Gao Y, 2021, Arxiv, DOI arXiv:2104.14297

[9]

Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261

[10]

Gemmeke JortF., 2013, 2013 IEEE workshop on applications of signal processing to audio and acoustics, P1

← 1 2 3 →