Separation of speech & music using temporal-spectral features and neural classifiers

被引:3
作者
Sawant, Omkar [1 ]
Bhowmick, Anirban [1 ]
Bhagwat, Ganesh [2 ]
机构
[1] VIT Bhopal Univ, SEEE, Bhopal, India
[2] Mercedez Benz, Bangalore, India
关键词
Music; Speech; MFCC; Spectrograms; SEK; RNN; CNN; SVM; NETWORKS;
D O I
10.1007/s12065-023-00828-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Separation of speech and music plays a vital role in multiple fields related to audio and speech processing. The spectrograms of speech and music show distinct patterns. This serves as the motivation for the differentiation of speech and music signals in an audio segment. The patterns have been further emphasized using Sobel edge kernels, Mel-spectrograms. For the inception of this paper, we have made a dataset from "All India Radio" news archives which is having separate and overlapped speech and music data in different languages. The different input features are extracted from these audio segments and further emphasized before feeding them to the different classifiers for distinguishing speech and music frames. We also compared the different classification algorithms for their varied performance in terms of accuracy. We have found that the convolutional neural network based approach on Mel-spectrograms and MFCC-delta-RNN methods have given a significantly better result compared to other approaches. Further, we wanted to see how these approaches work in the audio data of different languages, hence, we have applied the proposed method in three different languages such as Bengali, Punjabi, and Tamil. We have seen that the performance of the proposed method in all languages is consistent. The paper has also attempted to solve the problem of classifying audio segments with overlapped speech and music regions and achieved a good level of accuracy.
引用
收藏
页码:1389 / 1403
页数:15
相关论文
共 25 条
[1]   Spoken Indian language identification: a review of features and databases [J].
Aarti, Bakshi ;
Kopparapu, Sunil Kumar .
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2018, 43 (04)
[2]  
Albawi S, 2017, I C ENG TECHNOL
[3]  
Ashkzari Ali, 2014, Applied Mechanics and Materials, V568-570, P793, DOI 10.4028/www.scientific.net/AMM.568-570.793
[4]   A Case Study on Computer-Based Analysis of the Stochastic Stability of Mechanical Structures Driven by White and Colored Noise: Utilizing Artificial Intelligence Techniques to Design an Effective Active Suspension System [J].
Azizi, Aydin .
COMPLEXITY, 2020, 2020
[5]   Applications of Artificial Intelligence Techniques to Enhance Sustainability of Industry 4.0: Design of an Artificial Neural Network Model as Dynamic Behavior Optimizer of Robotic Arms [J].
Azizi, Aydin .
COMPLEXITY, 2020, 2020 (2020)
[6]   Introducing Neural Networks as a Computational Intelligent Technique [J].
Azizi, Aydin ;
Entessari, Farshid ;
Osgouie, Kambiz Ghaemi ;
Rashnoodi, Amirhossein Rezaei .
INTELLIGENT MATERIALS AND MECHATRONICS, 2014, 464 :369-+
[7]  
Azizi CA., 2013, APPL MECH MAT, V367, P317, DOI [10.4028/www.scientific.net/AMM.367.317, DOI 10.4028/WWW.SCIENTIFIC.NET/AMM.367.317]
[8]   Speech/Music Classification Using Features From Spectral Peaks [J].
Bhattacharjee, Mrinmoy ;
Prasanna, S. R. Mahadeva ;
Guha, Prithwijit .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) :1549-1559
[9]   Identification/segmentation of indian regional languages with singular value decomposition based feature embedding [J].
Bhowmick, Anirban ;
Biswas, Astik ;
AnveshKumar, Nella ;
Kottath, Rahul .
APPLIED ACOUSTICS, 2021, 176
[10]   Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition [J].
Bhowmick, Anirban ;
Biswas, Astik ;
Chandra, Mahesh .
PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (02) :527-539