Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification

被引：0

作者：

Chen, Liangwei ^{[1
]}

Zhou, Xiren ^{[2
]}

Chen, Huanhuan ^{[2
]}

机构：

[1] Univ Sci & Technol China, Sch Data Sci, Hefei, Peoples R China

[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10 | 2024年

基金：

国家重点研发计划;

关键词：

CANONICAL CORRELATION-ANALYSIS; RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rapid growth of audio data, there's a pressing need for automatic audio classification. As a type of time-series data, audio exhibits waveform fluctuations in both the time and frequency domains that evolve over time, with similar instances sharing consistent patterns. This study introduces the Audio Scanning Network (ASNet), designed to leverage abundant information for achieving stable and effective audio classification. ASNet captures real-time changes in audio waveforms across both time and frequency domains through reservoir computing, supported by Reservoir Kernel Canonical Correlation Analysis (RKCCA) to explore correlations between time-domain and frequency-domain waveform fluctuations. This innovative approach empowers ASNet to comprehensively capture the changes and inherent correlations within the audio waveform, and without the need for time-consuming iterative training. Instead of converting audio into spectrograms, ASNet directly utilizes audio feature sequences to uncover associations between time and frequency fluctuations. Experiments on environmental sound and music genre classification tasks demonstrate ASNet's comparable performance to state-of-the-art methods.

引用

页码：11355 / +

页数：10

共 50 条

[41] Digital Audio Forensics: Microphone and Environment Classification Using Deep Learning
Qamhan, Mustafa A.
Altaheri, Hamdi
Meftah, Ali Hamid
Muhammad, Ghulam
Alotaibi, Yousef Ajami
IEEE ACCESS, 2021, 9 : 62719 - 62733
[42] Time-frequency visual representation and texture features for audio applications: a comprehensive review, recent trends, and challenges
Mistry, Yogita D. D.
Birajdar, Gajanan K. K.
Khodke, Archana M. M.
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (23) : 36143 - 36177
[43] Music video emotion classification using slow-fast audio-video network and unsupervised feature representation
Pandeya, Yagya Raj
Bhattarai, Bhuwan
Lee, Joonwhoan
SCIENTIFIC REPORTS, 2021, 11 (01)
[44] AUDIO SYNCHRONISATION WITH A TUNNEL MATRIX FOR TIME SERIES AND DYNAMIC PROGRAMMING
Gorisch, Jan
Prevot, Laurent
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 3846 - 3850
[45] Audio Replay Attack Detection Using High-Frequency Features
Witkowski, Marcin
Kacprzak, Stanislaw
Zelasko, Piotr
Kowalczyk, Konrad
Galka, Jakub
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 27 - 31
[46] A Neural Network Approach for Sound Event Detection in Real Life Audio
Valenti, Michele
Tonelli, Dario
Vesperini, Fabio
Principi, Emanuele
Squartini, Stefano
2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2754 - 2758
[47] Affective Audio Annotation of Public Speeches with Convolutional Clustering Neural Network
Xu, Jiahao
Zhang, Boyan
Wang, Zhiyong
Wang, Yang
Chen, Fang
Gao, Junbin
Feng, David Dagan
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (01) : 238 - 249
[48] Inter-speaker variability in audio-visual classification of word prominence
Heckmann, Martin
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1790 - 1794
[49] CAS-TJ: Channel attention shuffle and temporal jigsaw for audio classification
Kim, Yongmin
Ko, Kyungdeuk
Lee, Junyeop
Ko, Hanseok
APPLIED ACOUSTICS, 2025, 233
[50] Audio-visual stimulation based emotion classification by correlated EEG channels
Ahirwal, Mitul Kumar
Kose, Mangesh Ramaji
HEALTH AND TECHNOLOGY, 2020, 10 (01) : 7 - 23

← 1 2 3 4 5 →