Investigating the Effective Dynamic Information of Spectral Shapes for Audio Classification

被引:0
作者
Chen, Liangwei [1 ]
Zhou, Xiren [2 ]
Chen, Qiuju [3 ]
Xiong, Fang [4 ]
Chen, Huanhuan [2 ]
机构
[1] Univ Sci & Technol China, Sch Artificial Intelligence & Data Sci, Hefei 230027, Peoples R China
[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Peoples R China
[3] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Peoples R China
[4] Cent South Univ, Xiangya Hosp, Natl Clin Res Ctr Geriatr Dis, Dept Otolaryngol Head & Neck Surg, Changsha 410078, Peoples R China
基金
国家重点研发计划;
关键词
Mel frequency cepstral coefficient; Data models; Spectral shape; Computational modeling; Feature extraction; Fitting; Music; Classification algorithms; Training; Multiple signal classification; Learning in the model space; dynamic information of the spectral shape; audio classification; mel-frequency cepstral coefficients; echo state network; MUSICAL GENRE CLASSIFICATION; FAULT-DIAGNOSIS; MODEL SPACE; RECOGNITION; ALGORITHM;
D O I
10.1109/TMM.2024.3521837
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The spectral shape holds crucial information for Audio Classification (AC), encompassing the spectrum's envelope, details, and dynamic changes over time. Conventional methods utilize cepstral coefficients for spectral shape description but overlook its variation details. Deep-learning approaches capture some dynamics but demand substantial training or fine-tuning resources. The Learning in the Model Space (LMS) framework precisely captures the dynamic information of temporal data by utilizing model fitting, even when computational resources and data are limited. However, applying LMS to audio faces challenges: 1) The high sampling rate of audio hinders efficient data fitting and capturing of dynamic information. 2) The Dynamic Information of Partial Spectral Shapes (DIPSS) may enhance classification, as only specific spectral shapes are relevant for AC. This paper extends an AC framework called Effective Dynamic Information Capture (EDIC) to tackle the above issues. EDIC constructs Mel-Frequency Cepstral Coefficients (MFCC) sequences within different dimensional intervals as the fitted data, which not only reduces the number of sequence sampling points but can also describe the change of the spectral shape in different parts over time. EDIC enables us to implement a topology-based selection algorithm in the model space, selecting effective DIPSS for the current AC task. The performance on three tasks confirms the effectiveness of EDIC.
引用
收藏
页码:1114 / 1126
页数:13
相关论文
共 44 条
  • [1] Investigating Combinations of Visual Audio Features and Distance Metrics in the Problem of Audio Classification
    Forczmanski, Pawel
    Maka, Tomasz
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015, 2016, 403 : 733 - 744
  • [2] Variational Information Bottleneck for Effective Low-resource Audio Classification
    Si, Shijing
    Wang, Jianzong
    Sun, Huiming
    Wu, Jianhan
    Zhang, Chuanyao
    Qu, Xiaoyang
    Cheng, Ning
    Chen, Lei
    Xiao, Jing
    INTERSPEECH 2021, 2021, : 591 - 595
  • [3] Privacy-Preserving Audio Classification Using Variational Information Feature Extraction
    Nelus, Alexandru
    Martin, Rainer
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2864 - 2877
  • [4] FEATURE EXTRACTION AND CLASSIFICATION FOR AUDIO INFORMATION IN NEWS VIDEO
    Song, Yu
    Wang, Wen-Hong
    Guo, Feng-Juan
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION, 2009, : 43 - +
  • [5] Audio Time Stretching Using Fuzzy Classification of Spectral Bins
    Damskagg, Eero-Pekka
    Valimaki, Vesa
    APPLIED SCIENCES-BASEL, 2017, 7 (12):
  • [6] Effective subspace detection based on the measurement of both the spectral and spatial information for hyperspectral image classification
    Mishu, Sadia Zaman
    Ahmed, Boshir
    Hossain, Md. Ali
    Uddin, Md. Palash
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2020, 41 (19) : 7541 - 7564
  • [7] SPEECH/AUDIO SIGNAL CLASSIFICATION USING SPECTRAL FLUX PATTERN RECOGNITION
    Lee, Sangkil
    Kim, Jieun
    Lee, Insung
    2012 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2012, : 232 - 236
  • [8] Fast SVM Training based on the Choice of Effective Samples for Audio Classification
    Zhang, Shilei
    Jiang, Hongchen
    Zhang, Shuwu
    Xu, Bo
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1654 - 1657
  • [9] EFFECTIVE AUDIO CLASSIFICATION ALGORITHM USING SWARM-BASED OPTIMIZATION
    Bae, Changseok
    Wahid, Noorhaniza
    Chung, Yuk Ping
    Yeh, Wei-Chang
    Bergmann, Neil William
    Chen, Zhe
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2014, 10 (01): : 151 - 167
  • [10] Classification of Vocal and Non-vocal Regions from Audio Songs using Spectral Features and Pitch Variations
    Murthy, Y. V. Srinivasa
    Koolagudi, Shashidhar G.
    2015 IEEE 28TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2015, : 1271 - 1276