Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning

被引:0
|
作者
Bhattacharjee, Mrinmoy [1 ]
Prasanna, S. R. M. [2 ]
Guha, Prithwijit [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, Assam, India
[2] Indian Inst Technol Dharwad, Dept Elect Engn, Dharwad 580011, Karnataka, India
关键词
Spectrogram; Task analysis; Harmonic analysis; Multiple signal classification; Speech processing; Feature extraction; Training; Speech music overlap detection; harmonic percussive source separation; multi-task learning; radio broadcast audio classification; BACKGROUND MUSIC; NETWORK; NOISE;
D O I
10.1109/TASLP.2022.3164199
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Detection of speech and music signals in isolated and overlapped conditions is an essential preprocessing step for many audio applications. Speech signals have wavy and continuous harmonics, while music signals exhibit horizontally linear and discontinuous harmonic patterns. Music signals also contain more percussive components than speech signals, manifested as vertical striations in the spectrograms. In case of speech music overlap, it might be challenging for automatic feature learning systems to extract class-specific horizontal and vertical striations from the combined spectrogram representation. A pre-processing step of separating the harmonic and percussive components before training might aid the classifier. Thus, this work proposes the use of harmonic-percussive source separation method to generate features for better detection of speech and music signals. Additionally, this work also explores the traditional and cascaded-information multi-task learning (MTL) frameworks to design better classifiers. MTL framework aids the training of the main task by employing simultaneous learning of several related auxiliary tasks. Results have been reported both on synthetically generated speech music overlapped signals and real recordings. Four state-of-the-art approaches are used for performance comparison. Experiments show that harmonic and percussive decomposition of spectrograms perform better as features. Moreover, the MTL-framework based classifiers further improve performances.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [1] MULTI-TASK LEARNING IMPROVES SYNTHETIC SPEECH DETECTION
    Mo, Yichuan
    Wang, Shilin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6392 - 6396
  • [2] Improving Speech-Based Dysarthria Detection using Multi-task Learning with Gradient Projection
    Xiang, Yan
    Berisha, Visar
    Liss, Julie
    Chakrabarti, Chaitali
    INTERSPEECH 2024, 2024, : 902 - 906
  • [3] A Multi-Task Learning Approach to Hate Speech Detection Leveraging Sentiment Analysis
    Plaza-Del-Arco, Flor Miriam
    Molina-Gonzalez, M. Dolores
    Urena-Lopez, L. Alfonso
    Martin-Valdivia, Maria Teresa
    IEEE ACCESS, 2021, 9 : 112478 - 112489
  • [4] HHSD: Hindi Hate Speech Detection Leveraging Multi-Task Learning
    Kapil, Prashant
    Kumari, Gitanjali
    Ekbal, Asif
    Pal, Santanu
    Chatterjee, Arindam
    Vinutha, B. N.
    IEEE ACCESS, 2023, 11 : 101460 - 101473
  • [5] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
    Hsu, Jia-Hao
    Wu, Chung-Hsien
    Wei, Yu-Hung
    INTERSPEECH 2023, 2023, : 4553 - 4557
  • [6] VOICE TOXICITY DETECTION USING MULTI-TASK LEARNING
    Nandwana, Mahesh Kumar
    He, Yifan
    Liu, Joseph
    Yu, Xiao
    Shang, Charles
    Du Bois, Eloi
    McGuire, Morgan
    Bhat, Kiran
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 331 - 335
  • [7] SUPERVISED CHORUS DETECTION FOR POPULAR MUSIC USING CONVOLUTIONAL NEURAL NETWORK AND MULTI-TASK LEARNING
    Wang, Ju-Chiang
    Smith, Jordan B. L.
    Chen, Jitong
    Song, Xuchen
    Wang, Yuxuan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 566 - 570
  • [8] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
    Parry, Jack
    DeMattos, Eric
    Klementiev, Anita
    Ind, Axel
    Morse-Kopp, Daniela
    Clarke, Georgia
    Palaz, Dimitri
    INTERSPEECH 2022, 2022, : 1158 - 1162
  • [9] Deep Chessboard Corner Detection Using Multi-task Learning
    Yoon, Hyunse
    Lee, Seongmin
    Kang, Jiwoo
    Lee, Sanghoon
    IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,
  • [10] Fetal Cardiac Structure Detection Using Multi-task Learning
    He, Jie
    Yang, Lei
    Zhu, Yunping
    Li, Donglian
    Ding, Zhixing
    Lu, Yuhuan
    Liang, Bocheng
    Li, Shengli
    ADVANCED INTELLIGENT COMPUTING IN BIOINFORMATICS, PT II, ICIC 2024, 2024, 14882 : 405 - 419