Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification

被引:0
作者
Chen, Liangwei [1 ]
Zhou, Xiren [2 ]
Chen, Huanhuan [2 ]
机构
[1] Univ Sci & Technol China, Sch Data Sci, Hefei, Peoples R China
[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10 | 2024年
基金
国家重点研发计划;
关键词
CANONICAL CORRELATION-ANALYSIS; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of audio data, there's a pressing need for automatic audio classification. As a type of time-series data, audio exhibits waveform fluctuations in both the time and frequency domains that evolve over time, with similar instances sharing consistent patterns. This study introduces the Audio Scanning Network (ASNet), designed to leverage abundant information for achieving stable and effective audio classification. ASNet captures real-time changes in audio waveforms across both time and frequency domains through reservoir computing, supported by Reservoir Kernel Canonical Correlation Analysis (RKCCA) to explore correlations between time-domain and frequency-domain waveform fluctuations. This innovative approach empowers ASNet to comprehensively capture the changes and inherent correlations within the audio waveform, and without the need for time-consuming iterative training. Instead of converting audio into spectrograms, ASNet directly utilizes audio feature sequences to uncover associations between time and frequency fluctuations. Experiments on environmental sound and music genre classification tasks demonstrate ASNet's comparable performance to state-of-the-art methods.
引用
收藏
页码:11355 / +
页数:10
相关论文
共 50 条
  • [21] Spectrogram based multi-task audio classification
    Zeng, Yuni
    Mao, Hua
    Peng, Dezhong
    Yi, Zhang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3705 - 3722
  • [22] AUDIO FEATURE EXTRACTION FOR VEHICLE ENGINE NOISE CLASSIFICATION
    Becker, Luca
    Nelus, Alexandra
    Gauer, Johannes
    Rudolph, Lars
    Martin, Rainer
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 711 - 715
  • [23] Deep Learning of Human Perception in Audio Event Classification
    Yu, Yi
    Beuret, Samuel
    Zeng, Donghuo
    Oyama, Keizo
    2018 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2018), 2018, : 188 - 189
  • [24] Learning audio sequence representations for acoustic event classification
    Zhang, Zixing
    Liu, Ding
    Han, Jing
    Qian, Kun
    Schuller, Bjorn W.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178
  • [25] Audio-Only Phonetic Segment Classification Using Embeddings Learned From Audio and Ultrasound Tongue Imaging Data
    Aytutuldu, Ilhan
    Genc, Yakup
    Akgul, Yusuf Sinan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4501 - 4510
  • [26] Discriminative Component Analysis Enhanced Feature Fusion of Electrical Network Frequency for Digital Audio Tampering Detection
    Zeng, Chunyan
    Kong, Shuai
    Wang, Zhifeng
    Li, Kun
    Zhao, Yuhao
    Wan, Xiangkui
    Chen, Yunfan
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (11) : 7173 - 7201
  • [27] Audio-Based Granularity-Adapted Emotion Classification
    Shepstone, Sven Ewan
    Tan, Zheng-Hua
    Jensen, Soren Holdt
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2018, 9 (02) : 176 - 190
  • [28] A Novel Approach of Audio Based Feature Optimisation for Bird Classification
    Ramashini, Murugaiya
    Abas, Pg Emeroylariffion
    De Silva, Liyanage C.
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2021, 29 (04): : 2383 - 2407
  • [29] Investigating the Effective Dynamic Information of Spectral Shapes for Audio Classification
    Chen, Liangwei
    Zhou, Xiren
    Chen, Qiuju
    Xiong, Fang
    Chen, Huanhuan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1114 - 1126
  • [30] Audio-Visual Classification and Detection of Human Manipulation Actions
    Pieropan, Alessandro
    Salvi, Giampiero
    Pauwels, Karl
    Kjellstrom, Hedvig
    2014 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2014), 2014, : 3045 - 3052