Investigating the Effective Dynamic Information of Spectral Shapes for Audio Classification

被引：0

作者：

Chen, Liangwei ^{[1
]}

Zhou, Xiren ^{[2
]}

Chen, Qiuju ^{[3
]}

Xiong, Fang ^{[4
]}

Chen, Huanhuan ^{[2
]}

机构：

[1] Univ Sci & Technol China, Sch Artificial Intelligence & Data Sci, Hefei 230027, Peoples R China

[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Peoples R China

[3] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Peoples R China

[4] Cent South Univ, Xiangya Hosp, Natl Clin Res Ctr Geriatr Dis, Dept Otolaryngol Head & Neck Surg, Changsha 410078, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2025年 / 27卷

基金：

国家重点研发计划;

关键词：

Mel frequency cepstral coefficient; Data models; Spectral shape; Computational modeling; Feature extraction; Fitting; Music; Classification algorithms; Training; Multiple signal classification; Learning in the model space; dynamic information of the spectral shape; audio classification; mel-frequency cepstral coefficients; echo state network; MUSICAL GENRE CLASSIFICATION; FAULT-DIAGNOSIS; MODEL SPACE; RECOGNITION; ALGORITHM;

D O I：

10.1109/TMM.2024.3521837

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The spectral shape holds crucial information for Audio Classification (AC), encompassing the spectrum's envelope, details, and dynamic changes over time. Conventional methods utilize cepstral coefficients for spectral shape description but overlook its variation details. Deep-learning approaches capture some dynamics but demand substantial training or fine-tuning resources. The Learning in the Model Space (LMS) framework precisely captures the dynamic information of temporal data by utilizing model fitting, even when computational resources and data are limited. However, applying LMS to audio faces challenges: 1) The high sampling rate of audio hinders efficient data fitting and capturing of dynamic information. 2) The Dynamic Information of Partial Spectral Shapes (DIPSS) may enhance classification, as only specific spectral shapes are relevant for AC. This paper extends an AC framework called Effective Dynamic Information Capture (EDIC) to tackle the above issues. EDIC constructs Mel-Frequency Cepstral Coefficients (MFCC) sequences within different dimensional intervals as the fitted data, which not only reduces the number of sequence sampling points but can also describe the change of the spectral shape in different parts over time. EDIC enables us to implement a topology-based selection algorithm in the model space, selecting effective DIPSS for the current AC task. The performance on three tasks confirms the effectiveness of EDIC.

引用

页码：1114 / 1126

页数：13

共 44 条

[1] Investigating Combinations of Visual Audio Features and Distance Metrics in the Problem of Audio Classification
Forczmanski, Pawel
Maka, Tomasz
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015, 2016, 403 : 733 - 744
[2] Variational Information Bottleneck for Effective Low-resource Audio Classification
Si, Shijing
Wang, Jianzong
Sun, Huiming
Wu, Jianhan
Zhang, Chuanyao
Qu, Xiaoyang
Cheng, Ning
Chen, Lei
Xiao, Jing
INTERSPEECH 2021, 2021, : 591 - 595
[3] Privacy-Preserving Audio Classification Using Variational Information Feature Extraction
Nelus, Alexandru
Martin, Rainer
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2864 - 2877
[4] FEATURE EXTRACTION AND CLASSIFICATION FOR AUDIO INFORMATION IN NEWS VIDEO
Song, Yu
Wang, Wen-Hong
Guo, Feng-Juan
PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION, 2009, : 43 - +
[5] Audio Time Stretching Using Fuzzy Classification of Spectral Bins
Damskagg, Eero-Pekka
Valimaki, Vesa
APPLIED SCIENCES-BASEL, 2017, 7 (12):
[6] Effective subspace detection based on the measurement of both the spectral and spatial information for hyperspectral image classification
Mishu, Sadia Zaman
Ahmed, Boshir
Hossain, Md. Ali
Uddin, Md. Palash
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2020, 41 (19) : 7541 - 7564
[7] SPEECH/AUDIO SIGNAL CLASSIFICATION USING SPECTRAL FLUX PATTERN RECOGNITION
Lee, Sangkil
Kim, Jieun
Lee, Insung
2012 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2012, : 232 - 236
[8] Fast SVM Training based on the Choice of Effective Samples for Audio Classification
Zhang, Shilei
Jiang, Hongchen
Zhang, Shuwu
Xu, Bo
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1654 - 1657
[9] EFFECTIVE AUDIO CLASSIFICATION ALGORITHM USING SWARM-BASED OPTIMIZATION
Bae, Changseok
Wahid, Noorhaniza
Chung, Yuk Ping
Yeh, Wei-Chang
Bergmann, Neil William
Chen, Zhe
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2014, 10 (01): : 151 - 167
[10] Classification of Vocal and Non-vocal Regions from Audio Songs using Spectral Features and Pitch Variations
Murthy, Y. V. Srinivasa
Koolagudi, Shashidhar G.
2015 IEEE 28TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2015, : 1271 - 1276

← 1 2 3 4 5 →