Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification

被引:0
作者
Chen, Liangwei [1 ]
Zhou, Xiren [2 ]
Chen, Huanhuan [2 ]
机构
[1] Univ Sci & Technol China, Sch Data Sci, Hefei, Peoples R China
[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10 | 2024年
基金
国家重点研发计划;
关键词
CANONICAL CORRELATION-ANALYSIS; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of audio data, there's a pressing need for automatic audio classification. As a type of time-series data, audio exhibits waveform fluctuations in both the time and frequency domains that evolve over time, with similar instances sharing consistent patterns. This study introduces the Audio Scanning Network (ASNet), designed to leverage abundant information for achieving stable and effective audio classification. ASNet captures real-time changes in audio waveforms across both time and frequency domains through reservoir computing, supported by Reservoir Kernel Canonical Correlation Analysis (RKCCA) to explore correlations between time-domain and frequency-domain waveform fluctuations. This innovative approach empowers ASNet to comprehensively capture the changes and inherent correlations within the audio waveform, and without the need for time-consuming iterative training. Instead of converting audio into spectrograms, ASNet directly utilizes audio feature sequences to uncover associations between time and frequency fluctuations. Experiments on environmental sound and music genre classification tasks demonstrate ASNet's comparable performance to state-of-the-art methods.
引用
收藏
页码:11355 / +
页数:10
相关论文
共 50 条
  • [1] AUDIO CLASSIFICATION FROM TIME-FREQUENCY TEXTURE
    Yu, Guoshen
    Slotine, Jean-Jacques
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1677 - +
  • [2] Scanning dial: the instantaneous audio classification transformer
    Jiang, Huawei
    Mutahira, Husna
    Park, Unsang
    Muhammad, Mannan Saeed
    DISCOVER APPLIED SCIENCES, 2024, 6 (03)
  • [3] Histogram of Gradients of Time-Frequency Representations for Audio Scene Classification
    Rakotomamonjy, Alain
    Gasso, Gilles
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 142 - 153
  • [4] Time-Frequency Feature Fusion for Noise Robust Audio Event Classification
    McLoughlin, Ian
    Xie, Zhipeng
    Song, Yan
    Phan, Huy
    Palaniappan, Ramaswamy
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (03) : 1672 - 1687
  • [5] Deep Belief Network based audio classification for construction sites monitoring
    Scarpiniti, Michele
    Colasante, Francesco
    Di Tanna, Simone
    Ciancia, Marco
    Lee, Yong-Cheol
    Uncini, Aurelio
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 177
  • [6] Feature pyramid attention network for audio-visual scene classification
    Zhou, Liguang
    Zhou, Yuhongze
    Qi, Xiaonan
    Hu, Junjie
    Lam, Tin Lun
    Xu, Yangsheng
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024,
  • [7] Investigating Combinations of Visual Audio Features and Distance Metrics in the Problem of Audio Classification
    Forczmanski, Pawel
    Maka, Tomasz
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015, 2016, 403 : 733 - 744
  • [8] A Classification Method for Environmental Audio Data
    Li, Ying
    2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 2, 2010, : 355 - 361
  • [9] Learning long-term filter banks for audio source separation and audio scene classification
    Zhang, Teng
    Wu, Ji
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [10] A Survey of Audio Classification Using Deep Learning
    Zaman, Khalid
    Sah, Melike
    Direkoglu, Cem
    Unoki, Masashi
    IEEE ACCESS, 2023, 11 : 106620 - 106649