Fourier Hilbert: The input transformation to enhance CNN models for speech emotion recognition

被引:0
作者
Ly, Bao Long [1 ]
机构
[1] FPT University, Ho Chi Minh
来源
Cognitive Robotics | 2024年 / 4卷
关键词
CNN; Enhancing; Fourier transformation; Hilbert curve; Input transformation; Signal processing; Speech emotion recognition;
D O I
10.1016/j.cogr.2024.11.002
中图分类号
学科分类号
摘要
Signal processing in general, and speech emotion recognition in particular, have long been familiar Artificial Intelligence (AI) tasks. With the explosion of deep learning, CNN models are used more frequently, accompanied by the emergence of many signal transformations. However, these methods often require significant hardware and runtime. In an effort to address these issues, we analyze and learn from existing transformations, leading us to propose a new method: Fourier Hilbert Transformation (FHT). In general, this method applies the Hilbert curve to Fourier images. The resulting images are small and dense, which is a shape well-suited to the CNN architecture. Additionally, the better distribution of information on the image allows the filters to fully utilize their power. These points support the argument that FHT provides an optimal input for CNN. Experiments conducted on popular datasets yielded promising results. FHT saves a large amount of hardware usage and runtime while maintaining high performance, even offers greater stability compared to existing methods. This opens up opportunities for deploying signal processing tasks on real-time systems with limited hardware. © 2024
引用
收藏
页码:228 / 236
页数:8
相关论文
共 25 条
[1]  
Hema C., Garcia Marquez F.P., Emotional speech recognition using CNN and deep learning techniques, Appl. Acoust., 211, (2023)
[2]  
Sonmez Y., Varol A., In-depth investigation of speech emotion recognition studies from past to present the importance of emotion recognition from speech signal for AI-, Intell. Syst. Appl., 22, (2024)
[3]  
Zhang S., Liu R., Tao X., Zhao X., Deep cross-corpus speech emotion recognition: recent advances and perspectives, Front. Neurorob., 15, (2021)
[4]  
Wang C., Tang Y., Ma X., Wu A., Popuri S., Okhonko D., Pino J., (2010)
[5]  
Hush D., Abdallah C., Horne B., Recursive neural networks for signal processing and control, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop, pp. 523-532, (1991)
[6]  
Eck D., Schmidhuber J., Finding temporal structure in music: blues improvisation with LSTM recurrent networks, 12, pp. 747-756, (2002)
[7]  
Setianingrum A.H., Hulliyah K., Amrilla M.F., Speech recognition of sundanese dialect using convolutional neural network method with mel-spectrogram feature extraction, 2023 11th International Conference on Cyber and IT Service Management (CITSM), pp. 1-5, (2023)
[8]  
Blaszke M., Kostek B., Musical instrument identification using deep learning approach, Sensors, 22, (2022)
[9]  
Bansal V., Pahwa G., Kannan N., Cough classification for COVID-19 based on audio mfcc features using convolutional neural networks, pp. 604-608, (2020)
[10]  
Campanharo A., Sirer M., Malmgren R., Ramos F., Amaral L., Duality between time series and networks, PloS one, 6, (2011)