Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification

被引:0
作者
Chen, Liangwei [1 ]
Zhou, Xiren [2 ]
Chen, Huanhuan [2 ]
机构
[1] Univ Sci & Technol China, Sch Data Sci, Hefei, Peoples R China
[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10 | 2024年
基金
国家重点研发计划;
关键词
CANONICAL CORRELATION-ANALYSIS; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of audio data, there's a pressing need for automatic audio classification. As a type of time-series data, audio exhibits waveform fluctuations in both the time and frequency domains that evolve over time, with similar instances sharing consistent patterns. This study introduces the Audio Scanning Network (ASNet), designed to leverage abundant information for achieving stable and effective audio classification. ASNet captures real-time changes in audio waveforms across both time and frequency domains through reservoir computing, supported by Reservoir Kernel Canonical Correlation Analysis (RKCCA) to explore correlations between time-domain and frequency-domain waveform fluctuations. This innovative approach empowers ASNet to comprehensively capture the changes and inherent correlations within the audio waveform, and without the need for time-consuming iterative training. Instead of converting audio into spectrograms, ASNet directly utilizes audio feature sequences to uncover associations between time and frequency fluctuations. Experiments on environmental sound and music genre classification tasks demonstrate ASNet's comparable performance to state-of-the-art methods.
引用
收藏
页码:11355 / +
页数:10
相关论文
共 50 条
  • [41] Digital Audio Forensics: Microphone and Environment Classification Using Deep Learning
    Qamhan, Mustafa A.
    Altaheri, Hamdi
    Meftah, Ali Hamid
    Muhammad, Ghulam
    Alotaibi, Yousef Ajami
    IEEE ACCESS, 2021, 9 : 62719 - 62733
  • [42] Time-frequency visual representation and texture features for audio applications: a comprehensive review, recent trends, and challenges
    Mistry, Yogita D. D.
    Birajdar, Gajanan K. K.
    Khodke, Archana M. M.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (23) : 36143 - 36177
  • [43] Music video emotion classification using slow-fast audio-video network and unsupervised feature representation
    Pandeya, Yagya Raj
    Bhattarai, Bhuwan
    Lee, Joonwhoan
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [44] AUDIO SYNCHRONISATION WITH A TUNNEL MATRIX FOR TIME SERIES AND DYNAMIC PROGRAMMING
    Gorisch, Jan
    Prevot, Laurent
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 3846 - 3850
  • [45] Audio Replay Attack Detection Using High-Frequency Features
    Witkowski, Marcin
    Kacprzak, Stanislaw
    Zelasko, Piotr
    Kowalczyk, Konrad
    Galka, Jakub
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 27 - 31
  • [46] A Neural Network Approach for Sound Event Detection in Real Life Audio
    Valenti, Michele
    Tonelli, Dario
    Vesperini, Fabio
    Principi, Emanuele
    Squartini, Stefano
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2754 - 2758
  • [47] Affective Audio Annotation of Public Speeches with Convolutional Clustering Neural Network
    Xu, Jiahao
    Zhang, Boyan
    Wang, Zhiyong
    Wang, Yang
    Chen, Fang
    Gao, Junbin
    Feng, David Dagan
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (01) : 238 - 249
  • [48] Inter-speaker variability in audio-visual classification of word prominence
    Heckmann, Martin
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1790 - 1794
  • [49] CAS-TJ: Channel attention shuffle and temporal jigsaw for audio classification
    Kim, Yongmin
    Ko, Kyungdeuk
    Lee, Junyeop
    Ko, Hanseok
    APPLIED ACOUSTICS, 2025, 233
  • [50] Audio-visual stimulation based emotion classification by correlated EEG channels
    Ahirwal, Mitul Kumar
    Kose, Mangesh Ramaji
    HEALTH AND TECHNOLOGY, 2020, 10 (01) : 7 - 23