Multi-Task Learning for Audio-Based Infant Cry Detection and Reasoning

被引:1
|
作者
Xia, Ming [1 ]
Huang, Dongmin [1 ]
Wang, Wenjin [1 ]
机构
[1] Southern Univ Sci & Technol, Dept Biomed Engn, Shenzhen 518055, Peoples R China
基金
海南省自然科学基金;
关键词
Pediatrics; Task analysis; Feature extraction; Cognition; Multitasking; Support vector machines; Spectrogram; Audio; infant cry detection; infant cry reason classification; multi-task learning;
D O I
10.1109/JBHI.2024.3454097
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Infant cry is a crucial indicator that offers valuable insights into their physical and mental conditions, such as hunger and pain. However, the scarcity of infant cry datasets hinders the model's generalization in real-life scenarios. The varying voiceprint characteristics among infants further exacerbate this challenge, deteriorating the model's performance on unseen infants. To this end, we propose a multi-task model for Infant Cry Detection and Reasoning (ICDR). It leverages datasets from two tasks to enrich data diversity and introduces an efficient attention module to achieve inter-task feature supplementarity. To mitigate the impact of subject differences, ICDR introduces an intra-task contrastive mixture of experts (CMoE) module that adaptively allocates experts to reduce subject variance and applies contrastive learning to enhance the representation consistency of samples from different infants in the same state. Extensive cross-subject experiments show that ICDR outperforms the state-of-the-art models in infant cry detection and reasoning, with an improvement of 2-9% in the F1-score. This demonstrates that multi-task learning effectively enhances the model's generalization ability by inter-task attention and intra-task CMoE.
引用
收藏
页码:7434 / 7446
页数:13
相关论文
共 50 条
  • [1] Convolutional Neural Networks for Audio-Based Continuous Infant Cry Monitoring at Home
    Xie, Jiali
    Long, Xi
    Otte, Renee A.
    Shan, Caifeng
    IEEE SENSORS JOURNAL, 2021, 21 (24) : 27710 - 27717
  • [2] Deep multi-task learning based detection of correlated mental disorders using audio modality
    Gupta, Rohan Kumar
    Sinha, Rohit
    COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [3] An Analogical Reasoning Method Based on Multi-task Learning with Relational Clustering
    Li, Shuyi
    Wu, Shaojuan
    Zhang, Xiaowang
    Feng, Zhiyong
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 144 - 147
  • [4] Hateful Memes Detection Based on Multi-Task Learning
    Ma, Zhiyu
    Yao, Shaowen
    Wu, Liwen
    Gao, Song
    Zhang, Yunqi
    MATHEMATICS, 2022, 10 (23)
  • [5] Spectrogram based multi-task audio classification
    Zeng, Yuni
    Mao, Hua
    Peng, Dezhong
    Yi, Zhang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3705 - 3722
  • [6] Spectrogram based multi-task audio classification
    Yuni Zeng
    Hua Mao
    Dezhong Peng
    Zhang Yi
    Multimedia Tools and Applications, 2019, 78 : 3705 - 3722
  • [7] Binaural Audio Generation via Multi-task Learning
    Li, Sijia
    Liu, Shiguang
    Manocha, Dinesh
    ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (06):
  • [8] A multi-task based deep learning approach for intrusion detection
    Liu, Qigang
    Wang, Deming
    Jia, Yuhang
    Luo, Suyuan
    Wang, Chongren
    KNOWLEDGE-BASED SYSTEMS, 2022, 238
  • [9] WEIGHTED AND MULTI-TASK LOSS FOR RARE AUDIO EVENT DETECTION
    Huy Phan
    Krawczyk-Becker, Martin
    Gerkmann, Timo
    Mertins, Alfred
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 336 - 340
  • [10] MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning
    Xu, Xiaogang
    Zhao, Hengshuang
    Vineet, Vibhav
    Lim, Ser-Nam
    Torralba, Antonio
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 304 - 321