An 'End-to-Evolution' Hybrid Approach for Snore Sound Classification

被引：18

作者：

Freitag, Michael ^{[1
]}

Amiriparian, Shahin ^{[1
,2
]}

Cummins, Nicholas ^{[1
]}

Gerczuk, Maurice ^{[1
]}

Schuller, Bjoern ^{[1
,3
]}

机构：

[1] Univ Passau, Chair Complex & Intelligent Syst, Passau, Germany

[2] Tech Univ Munich, Machine Intelligence & Signal Proc Grp, Munich, Germany

[3] Imperial Coll London, Machine Learning Grp, London, England

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

基金：

欧盟地平线“2020”;

关键词：

competitive swarm optimisation; evolutionary feature selection; convolutional neural network; snoring; computational paralinguistics; RECOGNITION; DECEPTION; SELECTION;

D O I：

10.21437/Interspeech.2017-173

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Whilst snoring itself is usually not harmful to a person's health, it can be an indication of Obstructive Sleep Apnoea (OSA), a serious sleep-related disorder. As a result, studies into using snoring as acoustic based marker of OSA are gaining in popularity. Motivated by this, the INTERSPEECH 2017 ComParE Snoring sub-challenge requires classification from which areas in the upper airways different snoring sounds originate. This paper explores a hybrid approach combining evolutionary feature selection based on competitive swarm optimisation and deep convolutional neural networks (CNN). Feature selection is applied to novel deep spectrum features extracted directly from spectrograms using pre-trained image classification CNN. Key results presented demonstrate that our hybrid approach can substantially increase the performance of a linear support vector machine on a set of low-level features extracted from the Snoring sub-challenge data. Even without subset selection, the deep spectrum features are sufficient to outperform the challenge baseline, and competitive swarm optimisation further improves system performance. In comparison to the challenge baseline, unweighted average recall is increased from 40.6 % to 57.6 % on the development partition, and from 58.5 % to 66.5 % on the test partition, using 2 246 of the 4 096 deep spectrum features.

引用

页码：3507 / 3511

页数：5

共 33 条

[1] A Review on Evolutionary Feature Selection [J].

Abd-Alsabour, Nadia .

UKSIM-AMSS EIGHTH EUROPEAN MODELLING SYMPOSIUM ON COMPUTER MODELLING AND SIMULATION (EMS 2014), 2014, :20-26

[2]

Aldrich M., 1999, SLEEP MED

[3]

Amiriparian S., 2017, P INTERSPEECH

[4] Is deception emotional? An emotion-driven predictive approach [J].

Amiriparian, Shahin ;

Pohjalainen, Jouni ;

Marchi, Erik ;

Pugachevskiy, Sergey ;

Schuller, Bjorn .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2011-2015

[5]

[Anonymous], P INT

[6]

[Anonymous], 2014, COMPUTING RES REPOSI

[7]

[Anonymous], 2011, Scaling up Machine Learning: Parallel and Distributed Approaches

[8]

[Anonymous], 1997, ICML

[9]

[Anonymous], P INTERSPEECH FLOR I

[10] A particle-swarm-optimized fuzzy-neural network for voice-controlled robot systems [J].

Chatterjee, A ;

Pulasinghe, K ;

Watanabe, K ;

Izumi, K .

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2005, 52 (06) :1478-1489

← 1 2 3 4 →