Connectogram - A graph-based time dependent representation for sounds

被引:8
作者
Turker, Ilker [1 ]
Aksu, Serkan [2 ]
机构
[1] Karabuk Univ, Dept Comp Engn, Karabuk, Turkey
[2] Bartin Univ, Dept Comp Technol, Bartin, Turkey
关键词
Graph representation; Sound classification; Time-series classification; Complex networks; Deep learning; Machine learning; CONVOLUTIONAL NEURAL-NETWORKS; SERIES; CLASSIFICATION; RECOGNITION;
D O I
10.1016/j.apacoust.2022.108660
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The proposed method contributes the time-series classification literature with a novel time-convexity based representation, which extends the current graph conversion approaches by introducing the time dimension, also introducing a colorful graph-generator approach. The representation capability of connectograms is tested in comparison with mel-spectrograms (mels) and MFCCs for an environmental sound classification task, as input to state-of-art transfer learning models. Results indicate that connectograms cannot compete with the best-performer mel-spectrogram representations in standalone format, however they significantly improve their classification performance in case they are combined as single layers of hybrid RGB representations. A combination of [mels + mels + connectogram] outperforms either sole representations or their combinations by 2-3%, with 96.46% classification accuracy for ResNet50 classifier model.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:8
相关论文
共 52 条
[1]   Environmental sound classification using optimum allocation sampling based empirical mode decomposition [J].
Ahmad, Saad ;
Agrawal, Shubham ;
Joshi, Samta ;
Taran, Sachin ;
Bajaj, Varun ;
Demir, Fatih ;
Sengur, Abdulkadir .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 537
[2]   Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles [J].
Bagnall, Anthony ;
Lines, Jason ;
Hills, Jon ;
Bostrom, Aaron .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (09) :2522-2535
[3]   A deep learning framework for financial time series using stacked autoencoders and long-short term memory [J].
Bao, Wei ;
Yue, Jun ;
Rao, Yulei .
PLOS ONE, 2017, 12 (07)
[4]   TimeScaleNet: A Multiresolution Approach for Raw Audio Recognition Using Learnable Biquadratic IIR Filters and Residual Networks of Depthwise-Separable One-Dimensional Atrous Convolutions [J].
Bavu, Eric ;
Ramamonjy, Aro ;
Pujol, Hadrien ;
Garcia, Alexandre .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) :220-235
[5]  
Baydilli YY, 2017, EC COMPUT EC CYBERN, V51
[6]   Complex network analysis of brain functional connectivity under a multi-step cognitive task [J].
Cai, Shi-Min ;
Chen, Wei ;
Liu, Dong-Bai ;
Tang, Ming ;
Chen, Xun .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2017, 466 :663-671
[7]  
Cances L., 2021, ICASSP 2021
[8]   Multi-head CNN-RNN for multi-time series anomaly detection: An industrial case study [J].
Canizo, Mikel ;
Triguero, Isaac ;
Conde, Angel ;
Onieva, Enrique .
NEUROCOMPUTING, 2019, 363 :246-260
[9]  
Cao D., 2020, Proceedings of the 34th International Conference on Neural Information Processing Systems, P1491
[10]   Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition [J].
Chandrakala, S. ;
Jayalakshmi, S. L. .
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (01) :3-14