Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification

被引:0
作者
Chen, Liangwei [1 ]
Zhou, Xiren [2 ]
Chen, Huanhuan [2 ]
机构
[1] Univ Sci & Technol China, Sch Data Sci, Hefei, Peoples R China
[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10 | 2024年
基金
国家重点研发计划;
关键词
CANONICAL CORRELATION-ANALYSIS; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of audio data, there's a pressing need for automatic audio classification. As a type of time-series data, audio exhibits waveform fluctuations in both the time and frequency domains that evolve over time, with similar instances sharing consistent patterns. This study introduces the Audio Scanning Network (ASNet), designed to leverage abundant information for achieving stable and effective audio classification. ASNet captures real-time changes in audio waveforms across both time and frequency domains through reservoir computing, supported by Reservoir Kernel Canonical Correlation Analysis (RKCCA) to explore correlations between time-domain and frequency-domain waveform fluctuations. This innovative approach empowers ASNet to comprehensively capture the changes and inherent correlations within the audio waveform, and without the need for time-consuming iterative training. Instead of converting audio into spectrograms, ASNet directly utilizes audio feature sequences to uncover associations between time and frequency fluctuations. Experiments on environmental sound and music genre classification tasks demonstrate ASNet's comparable performance to state-of-the-art methods.
引用
收藏
页码:11355 / +
页数:10
相关论文
共 50 条
  • [31] Hierarchical Classification of Bird Species Using Their Audio Recorded Songs
    Silla, Carlos N., Jr.
    Kaestner, Celso A. A.
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 1895 - 1900
  • [32] Affective Classification of Generic Audio Clips using Regression Models
    Malandrakis, Nikolaos
    Sundaram, Shiva
    Potamianos, Alexandros
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2831 - 2835
  • [33] What Makes Audio Event Detection Harder than Classification?
    Huy Phan
    Koch, Philipp
    Katzberg, Fabrice
    Maass, Marco
    Mazur, Radoslaw
    McLoughlin, Ian
    Mertins, Alfred
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2739 - 2743
  • [34] Classification of audio scenes with novel features in a fused system framework
    Waldekar, Shefali
    Saha, Goutam
    DIGITAL SIGNAL PROCESSING, 2018, 75 : 71 - 82
  • [35] Adaptive Mid-Term Representations for Robust Audio Event Classification
    Martin-Morato, Irene
    Cobos, Maximo
    Ferri, Francesc J.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) : 2381 - 2392
  • [36] Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks
    Sharan, Roneel, V
    Xiong, Hao
    Berkovsky, Shlomo
    SENSORS, 2021, 21 (10)
  • [37] What Affects the Performance of Convolutional Neural Networks for Audio Event Classification
    Wang, Helin
    Chong, Dading
    Huang, Dongyan
    Zou, Yuexian
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 140 - 146
  • [38] Multiple Classifier Systems for the Classification of Audio-Visual Emotional States
    Glodek, Michael
    Tschechne, Stephan
    Layher, Georg
    Schels, Martin
    Brosch, Tobias
    Scherer, Stefan
    Kaechele, Markus
    Schmidt, Miriam
    Neumann, Heiko
    Palm, Guenther
    Schwenker, Friedhelm
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PT II, 2011, 6975 : 359 - 368
  • [39] Investigation of Spoken-Language Detection and Classification in Broadcasted Audio Content
    Kotsakis, Rigas
    Matsiola, Maria
    Kalliris, George
    Dimoulas, Charalampos
    INFORMATION, 2020, 11 (04)
  • [40] CLASSIFICATION OF AUDIO SCENES USING NARROW-BAND AUTOCORRELATION FEATURES
    Valero, Xavier
    Alias, Francesc
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2015 - 2019