A new approach for classification of generic audio data

被引:4
作者
Lin, RS [1 ]
Chen, LH [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp & Informat Sci, Hsinchu 30050, Taiwan
关键词
audio classification; spectrogram; Bayesian decision function; multivariable Gaussian distribution;
D O I
10.1142/S0218001405003958
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The existing audio retrieval systems fall into one of two categories: single-domain systems that can accept data of only a single type (e.g. speech) or multiple-domain systems that offer content-based retrieval for multiple types of audio data. Since a single-domain system has limited applications, a multiple-domain system will be more useful. However, different types of audio data will have different properties, this will make a multiple-domain system harder to be developed. If we can classify audio information in advance, the above problems can be solved. In this paper, we will propose a real-time classification method to classify audio signals into several basic audio types such as pure speech, music, song, speech with music background, and speech with environmental noise background. In order to make the proposed method robust for a variety of audio sources, we use Bayesian decision function for multivariable Gaussian distribution instead of manually adjusting a threshold for each discriminator. The proposed approach can be applied to content-based audio/video retrieval. In the experiment, the efficiency and effectiveness of this method are shown by an accuracy rate of more than 96% for general audio data classification.
引用
收藏
页码:63 / 78
页数:16
相关论文
共 50 条
[41]   Investigating Combinations of Visual Audio Features and Distance Metrics in the Problem of Audio Classification [J].
Forczmanski, Pawel ;
Maka, Tomasz .
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015, 2016, 403 :733-744
[42]   ASiT: Local-Global Audio Spectrogram Vision Transformer for Event Classification [J].
Ahmed, Sara Atito Ali ;
Awais, Muhammad ;
Wang, Wenwu ;
Plumbley, Mark D. ;
Kittler, Josef .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 :3684-3693
[43]   An innovative deep active learning approach for improving unlabeled audio classification by selectively querying informative instance [J].
Salama, Mohamed .
INTERNATIONAL JOURNAL OF ENGINEERING BUSINESS MANAGEMENT, 2023, 15
[44]   AUDIO CLASSIFICATION OF MUSIC/SPEECH MIXED SIGNALS USING SINUSOIDAL MODELING WITH SVM AND NEURAL NETWORK APPROACH [J].
Mowlaee, Pejman ;
Sayadiyan, Abolghasem .
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2013, 22 (02)
[45]   Tensor semantic model for an audio classification system [J].
XING Ling ;
MA Qiang ;
ZHU Min .
ScienceChina(InformationSciences), 2013, 56 (06) :106-114
[46]   An Ensemble of Convolutional Neural Networks for Audio Classification [J].
Nanni, Loris ;
Maguolo, Gianluca ;
Brahnam, Sheryl ;
Paci, Michelangelo .
APPLIED SCIENCES-BASEL, 2021, 11 (13)
[47]   MULTI-VIEW AUDIO AND MUSIC CLASSIFICATION [J].
Phan, Huy ;
Le Nguyen, Huy ;
Chen, Oliver Y. ;
Pham, Lam ;
Koch, Philipp ;
McLoughlin, Ian ;
Mertins, Alfred .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :611-615
[48]   Comparison and Analysis of SampleCNN Architectures for Audio Classification [J].
Kim, Taejun ;
Lee, Jongpil ;
Nam, Juhan .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) :285-297
[49]   SPORTS AUDIO CLASSIFICATION BASED ON MFCC AND GMM [J].
Liu Jiqing ;
Dong Yuan ;
Huang Jun ;
Zhao Xianyu ;
Wang Haila .
PROCEEDINGS OF 2009 2ND IEEE INTERNATIONAL CONFERENCE ON BROADBAND NETWORK & MULTIMEDIA TECHNOLOGY, 2009, :482-+
[50]   Adapting a ConvNeXt model to audio classification on AudioSet [J].
Pellegrini, Thomas ;
Khalfaoui-Hassani, Ismail ;
Labbe, Etienne ;
Masquelier, Timothee .
INTERSPEECH 2023, 2023, :4169-4173