A CNN-GRU APPROACH TO CAPTURE TIME-FREQUENCY PATTERN INTERDEPENDENCE FOR SNORE SOUND CLASSIFICATION

被引:0
作者
Wang, Jianhong [1 ]
Stromfelt, Harald [1 ]
Schuller, Bjoern W. [1 ,2 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany
来源
2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2018年
关键词
DualConvGRU Network; Dual Convolutional Layers; Channel Slice Model; EMOTION; RECOGNITION; DECEPTION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this work, we propose an architecture named DualConvGRU Network to overcome the INTERPEECH 2017 ComParE Snoring sub-challenge. In this network, we devise two new models: the Dual Convolutional Layer, which is applied to a spectrogram to extract features; and the Channel Slice Model, which reprocess the extracted features. The first amalgamates an ensemble of information collected from two types of convolutional operations, with differing kernel dimension on the frequency axis and equal dimension on the time axis. Secondly, the dependencies within the convolutional layer channel axes are learnt, by feeding channel slices into a Gated Recurrent Unit (GRU) layer. By taking this approach, convolutional layers can be connected to sequential models without the use of fully connected layers. Compared with other state-of-the-art methods delivered to INTERPEECH 2017 ComParE Snoring sub-challenge, our method ranks 5th on performance of test data. Moreover, we are the only competitor to train a deep learning model solely on the provided training data, except for Baseline. The performance of our model exceeds the baseline too much.
引用
收藏
页码:997 / 1001
页数:5
相关论文
共 26 条
  • [1] Snore Sound Classification Using Image-based Deep Spectrum Features
    Amiriparian, Shahin
    Gerczuk, Maurice
    Ottl, Sandra
    Cummins, Nicholas
    Freitag, Michael
    Pugachevskiy, Sergey
    Baird, Alice
    Schuller, Bjoern
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3512 - 3516
  • [2] Is deception emotional? An emotion-driven predictive approach
    Amiriparian, Shahin
    Pohjalainen, Jouni
    Marchi, Erik
    Pugachevskiy, Sergey
    Schuller, Bjorn
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2011 - 2015
  • [3] Dimension Reduction: A Guided Tour
    Burges, Christopher J. C.
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2010, 2 (04): : 275 - 365
  • [4] A Competitive Swarm Optimizer for Large Scale Optimization
    Cheng, Ran
    Jin, Yaochu
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (02) : 191 - 204
  • [5] Eyben F., 2010, P 18 ACM INT C MULT, P1459
  • [6] Acoustic analysis of snoring sound in patients with simple snoring and obstructive sleep apnoea
    Fiz, JA
    Abad, J
    Jane, R
    Riera, M
    Mananas, MA
    Caminal, P
    Rodenstein, D
    Morera, J
    [J]. EUROPEAN RESPIRATORY JOURNAL, 1996, 9 (11) : 2365 - 2370
  • [7] Freitag M., 2017, P INTERSPEECH STOCKH, P3508
  • [8] DNN-based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification
    Gosztolya, Gabor
    Busa-Fekete, Robert
    Grosz, Mantis
    Toth, Laszlo
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3522 - 3526
  • [9] Herath DL, 2013, IEEE ENG MED BIO, P3961, DOI 10.1109/EMBC.2013.6610412
  • [10] Ivanov A, 2012, INT CONF ACOUST SPEE, P5125, DOI 10.1109/ICASSP.2012.6289074