Multi-Task Learning for Audio-Based Infant Cry Detection and Reasoning

被引:1
|
作者
Xia, Ming [1 ]
Huang, Dongmin [1 ]
Wang, Wenjin [1 ]
机构
[1] Southern Univ Sci & Technol, Dept Biomed Engn, Shenzhen 518055, Peoples R China
基金
海南省自然科学基金;
关键词
Pediatrics; Task analysis; Feature extraction; Cognition; Multitasking; Support vector machines; Spectrogram; Audio; infant cry detection; infant cry reason classification; multi-task learning;
D O I
10.1109/JBHI.2024.3454097
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Infant cry is a crucial indicator that offers valuable insights into their physical and mental conditions, such as hunger and pain. However, the scarcity of infant cry datasets hinders the model's generalization in real-life scenarios. The varying voiceprint characteristics among infants further exacerbate this challenge, deteriorating the model's performance on unseen infants. To this end, we propose a multi-task model for Infant Cry Detection and Reasoning (ICDR). It leverages datasets from two tasks to enrich data diversity and introduces an efficient attention module to achieve inter-task feature supplementarity. To mitigate the impact of subject differences, ICDR introduces an intra-task contrastive mixture of experts (CMoE) module that adaptively allocates experts to reduce subject variance and applies contrastive learning to enhance the representation consistency of samples from different infants in the same state. Extensive cross-subject experiments show that ICDR outperforms the state-of-the-art models in infant cry detection and reasoning, with an improvement of 2-9% in the F1-score. This demonstrates that multi-task learning effectively enhances the model's generalization ability by inter-task attention and intra-task CMoE.
引用
收藏
页码:7434 / 7446
页数:13
相关论文
共 50 条
  • [21] CURRICULUM BASED MULTI-TASK LEARNING FOR PARKINSON'S DISEASE DETECTION
    Dhinagar, Nikhil J.
    Owens-Walton, Conor
    Laltoo, Emily
    Boyle, Christina P.
    Chen, Yao-Liang
    Cook, Philip
    McMillan, Corey
    Tsai, Chih-Chien
    Wang, J-J
    Wu, Yih-Ru
    Van der Werf, Ysbrand
    Thompson, Paul M.
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [22] Contrastive Learning based Multi-task Network for Image Manipulation Detection
    Yin, Qilin
    Wang, Jinwei
    Lu, Wei
    Luo, Xiangyang
    SIGNAL PROCESSING, 2022, 201
  • [23] Smart Contract Vulnerability Detection Model Based on Multi-Task Learning
    Huang, Jing
    Zhou, Kuo
    Xiong, Ao
    Li, Dongmeng
    SENSORS, 2022, 22 (05)
  • [24] Event Detection via Context Understanding Based on Multi-task Learning
    Xia, Jing
    Li, Xiaolong
    Tan, Yongbin
    Zhang, Wu
    Li, Dajun
    Xiong, Zhengkun
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (01)
  • [25] Multi-task Learning Based on Multiple Data Sources for Cancer Detection
    Hong, Siyi
    2021 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING, BIG DATA AND BUSINESS INTELLIGENCE (MLBDBI 2021), 2021, : 486 - 491
  • [26] Multi-Task Learning Based Joint Pulse Detection and Modulation Classification
    Akyon, Fatih Cagatay
    Nuhoglu, Mustafa Atahan
    Alp, Yasar Kemal
    Arikan, Orhan
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [27] DNN-Based Voice Activity Detection with Multi-Task Learning
    Kang, Tae Gyoon
    Kim, Nam Soo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (02): : 550 - 553
  • [28] Image Inpainting Detection Based on Multi-task Deep Learning Network
    Wang, Xinyi
    Niu, Shaozhang
    Wang, He
    IETE TECHNICAL REVIEW, 2021, 38 (01) : 149 - 157
  • [29] Audio-Based Epileptic Seizure Detection
    Ahsan, M. N. Istiaq
    Kertesz, Csaba
    Mesaros, Annamaria
    Heittola, Toni
    Knight, Andrew
    Virtanen, Tuomas
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [30] Multimodal and Multi-task Audio-Visual Vehicle Detection and Classification
    Wang, Tao
    Zhu, Zhigang
    2012 IEEE NINTH INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL-BASED SURVEILLANCE (AVSS), 2012, : 440 - 446