Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning

被引：0

作者：

Bhattacharjee, Mrinmoy ^{[1
]}

Prasanna, S. R. M. ^{[2
]}

Guha, Prithwijit ^{[1
]}

机构：

[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, Assam, India

[2] Indian Inst Technol Dharwad, Dept Elect Engn, Dharwad 580011, Karnataka, India

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

关键词：

Spectrogram; Task analysis; Harmonic analysis; Multiple signal classification; Speech processing; Feature extraction; Training; Speech music overlap detection; harmonic percussive source separation; multi-task learning; radio broadcast audio classification; BACKGROUND MUSIC; NETWORK; NOISE;

D O I：

10.1109/TASLP.2022.3164199

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Detection of speech and music signals in isolated and overlapped conditions is an essential preprocessing step for many audio applications. Speech signals have wavy and continuous harmonics, while music signals exhibit horizontally linear and discontinuous harmonic patterns. Music signals also contain more percussive components than speech signals, manifested as vertical striations in the spectrograms. In case of speech music overlap, it might be challenging for automatic feature learning systems to extract class-specific horizontal and vertical striations from the combined spectrogram representation. A pre-processing step of separating the harmonic and percussive components before training might aid the classifier. Thus, this work proposes the use of harmonic-percussive source separation method to generate features for better detection of speech and music signals. Additionally, this work also explores the traditional and cascaded-information multi-task learning (MTL) frameworks to design better classifiers. MTL framework aids the training of the main task by employing simultaneous learning of several related auxiliary tasks. Results have been reported both on synthetically generated speech music overlapped signals and real recordings. Four state-of-the-art approaches are used for performance comparison. Experiments show that harmonic and percussive decomposition of spectrograms perform better as features. Moreover, the MTL-framework based classifiers further improve performances.

引用

页码：1 / 10

页数：10

共 50 条

[1] MULTI-TASK LEARNING IMPROVES SYNTHETIC SPEECH DETECTION
Mo, Yichuan
Wang, Shilin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6392 - 6396
[2] Improving Speech-Based Dysarthria Detection using Multi-task Learning with Gradient Projection
Xiang, Yan
Berisha, Visar
Liss, Julie
Chakrabarti, Chaitali
INTERSPEECH 2024, 2024, : 902 - 906
[3] A Multi-Task Learning Approach to Hate Speech Detection Leveraging Sentiment Analysis
Plaza-Del-Arco, Flor Miriam
Molina-Gonzalez, M. Dolores
Urena-Lopez, L. Alfonso
Martin-Valdivia, Maria Teresa
IEEE ACCESS, 2021, 9 : 112478 - 112489
[4] HHSD: Hindi Hate Speech Detection Leveraging Multi-Task Learning
Kapil, Prashant
Kumari, Gitanjali
Ekbal, Asif
Pal, Santanu
Chatterjee, Arindam
Vinutha, B. N.
IEEE ACCESS, 2023, 11 : 101460 - 101473
[5] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
Hsu, Jia-Hao
Wu, Chung-Hsien
Wei, Yu-Hung
INTERSPEECH 2023, 2023, : 4553 - 4557
[6] VOICE TOXICITY DETECTION USING MULTI-TASK LEARNING
Nandwana, Mahesh Kumar
He, Yifan
Liu, Joseph
Yu, Xiao
Shang, Charles
Du Bois, Eloi
McGuire, Morgan
Bhat, Kiran
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 331 - 335
[7] SUPERVISED CHORUS DETECTION FOR POPULAR MUSIC USING CONVOLUTIONAL NEURAL NETWORK AND MULTI-TASK LEARNING
Wang, Ju-Chiang
Smith, Jordan B. L.
Chen, Jitong
Song, Xuchen
Wang, Yuxuan
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 566 - 570
[8] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
Parry, Jack
DeMattos, Eric
Klementiev, Anita
Ind, Axel
Morse-Kopp, Daniela
Clarke, Georgia
Palaz, Dimitri
INTERSPEECH 2022, 2022, : 1158 - 1162
[9] Deep Chessboard Corner Detection Using Multi-task Learning
Yoon, Hyunse
Lee, Seongmin
Kang, Jiwoo
Lee, Sanghoon
IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,
[10] Fetal Cardiac Structure Detection Using Multi-task Learning
He, Jie
Yang, Lei
Zhu, Yunping
Li, Donglian
Ding, Zhixing
Lu, Yuhuan
Liang, Bocheng
Li, Shengli
ADVANCED INTELLIGENT COMPUTING IN BIOINFORMATICS, PT II, ICIC 2024, 2024, 14882 : 405 - 419

← 1 2 3 4 5 →