Learning complementary representations via attention-based ensemble learning for cough-based COVID-19 recognition

被引:0
作者
Ren, Zhao [1 ,2 ]
Chang, Yi [3 ]
Nejdl, Wolfgang [2 ]
Schuller, Bjoern W. [1 ,3 ]
机构
[1] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, D-86159 Augsburg, Germany
[2] Leibniz Univ Hannover, L3S Res Ctr, D-30167 Hannover, Germany
[3] Imperial Coll London, GLAM Grp Language Audio & Mus, London SW7 2AZ, England
来源
ACTA ACUSTICA | 2022年 / 6卷
基金
欧盟地平线“2020”;
关键词
COVID-19; Cough sound; Ensemble learning; Attention mechanism; Complementary representation; FUSION;
D O I
10.1051/aacus/2022029
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Coughs sounds have shown promising as a potential marker for distinguishing COVID individuals from non-COVID ones. In this paper, we propose an attention-based ensemble learning approach to learn complementary representations from cough samples. Unlike most traditional schemes such as mere maxing or averaging, the proposed approach fairly considers the contribution of the representation generated by each single model. The attention mechanism is further investigated at the feature level and the decision level. Evaluated on the Track-1 test set of the DiCOVA challenge 2021, the experimental results demonstrate that the proposed feature-level attention-based ensemble learning achieves the best performance (Area Under Curve, AUC: 77.96%), resulting in an 8.05% improvement over the challenge baseline.
引用
收藏
页数:5
相关论文
共 19 条
[1]   Reducing chances of COVID-19 infection by a cough cloud in a closed space [J].
Agrawal, Amit ;
Bhardwaj, Rajneesh .
PHYSICS OF FLUIDS, 2020, 32 (10)
[2]   Snore Sound Classification Using Image-based Deep Spectrum Features [J].
Amiriparian, Shahin ;
Gerczuk, Maurice ;
Ottl, Sandra ;
Cummins, Nicholas ;
Freitag, Michael ;
Pugachevskiy, Sergey ;
Baird, Alice ;
Schuller, Bjoern .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :3512-3516
[3]   Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data [J].
Brown, Chloe ;
Chauhan, Jagmohan ;
Grammenos, Andreas ;
Han, Jing ;
Hasthanasombat, Apinan ;
Spathis, Dimitris ;
Xia, Tong ;
Cicuta, Pietro ;
Mascolo, Cecilia .
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :3474-3484
[4]   Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction [J].
Chen, Shizhe ;
Jin, Qin .
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :571-575
[5]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6]  
Eyben F., 2013, Proceedings of the international conference on Multimedia, P835, DOI [DOI 10.1145/2502081.2502224, 10.1145/2502081.2502224]
[7]  
Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261
[8]   Emotion Recognition in Speech with Latent Discriminative Representations Learning [J].
Han, Jing ;
Zhang, Zixing ;
Keren, Gil ;
Schuller, Bjorn .
ACTA ACUSTICA UNITED WITH ACUSTICA, 2018, 104 (05) :737-740
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]  
Infante C., 2017, P GHTC SAN JOS CA, P1, DOI [10.1109/GHTC.2017.8239338, DOI 10.1109/GHTC.2017.8239338]