An octonion-based nonlinear echo state network for speech emotion recognition in Metaverse

被引:29
作者
Daneshfar, Fatemeh [1 ]
Jamshidi, Mohammad [2 ]
机构
[1] Univ Kurdistan, Dept Comp Engn, Sanandaj, Iran
[2] Univ West Bohemia, Fac Elect Engn, Plzen, Czech Republic
关键词
Speech emotion recognition; Digital twins; Metaverse; Octonion algebra; Echo state network; Machine learning; FEATURE-SELECTION; FEATURES; FUSION; MODEL;
D O I
10.1016/j.neunet.2023.03.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While the Metaverse is becoming a popular trend and drawing much attention from academia, society, and businesses, processing cores used in its infrastructures need to be improved, particularly in terms of signal processing and pattern recognition. Accordingly, the speech emotion recognition (SER) method plays a crucial role in creating the Metaverse platforms more usable and enjoyable for its users. However, existing SER methods continue to be plagued by two significant problems in the online environment. The shortage of adequate engagement and customization between avatars and users is recognized as the first issue and the second problem is related to the complexity of SER problems in the Metaverse as we face people and their digital twins or avatars. This is why developing efficient machine learning (ML) techniques specified for hypercomplex signal processing is essential to enhance the impressiveness and tangibility of the Metaverse platforms. As a solution, echo state networks (ESNs), which are an ML powerful tool for SER, can be an appropriate technique to enhance the Metaverse's foundations in this area. Nevertheless, ESNs have some technical issues restricting them from a precise and reliable analysis, especially in the aspect of high-dimensional data. The most significant limitation of these networks is the high memory consumption caused by their reservoir structure in face of high -dimensional signals. To solve all problems associated with ESNs and their application in the Metaverse, we have come up with a novel structure for ESNs empowered by octonion algebra called NO2GESNet. Octonion numbers have eight dimensions, compactly display high-dimensional data, and improve the network precision and performance in comparison to conventional ESNs. The proposed network also solves the weaknesses of the ESNs in the presentation of the higher-order statistics to the output layer by equipping it with a multidimensional bilinear filter. Three comprehensive scenarios to use the proposed network in the Metaverse have been designed and analyzed, not only do they show the accuracy and performance of the proposed approach, but also the ways how SER can be employed in the Metaverse platforms.(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:108 / 121
页数:14
相关论文
共 60 条
[1]  
Adams D. W., 2021, APPL ECHO STATE NETW
[2]   Speech Emotion Recognition Using Scalogram Based Deep Structure [J].
Aghajani, K. ;
Afrakoti, I. Esmaili Paeen .
INTERNATIONAL JOURNAL OF ENGINEERING, 2020, 33 (02) :285-292
[3]   Single Frequency Filtering Approach for Discriminating Speech and Nonspeech [J].
Aneeja, G. ;
Yegnanarayana, B. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (04) :705-717
[4]   Intermediary Fuzzification in Speech Emotion Recognition [J].
Assuncao, Gustavo ;
Menezes, Paulo .
2020 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2020,
[5]   Deep features-based speech emotion recognition for smart affective services [J].
Badshah, Abdul Malik ;
Rahim, Nasir ;
Ullah, Noor ;
Ahmad, Jamil ;
Muhammad, Khan ;
Lee, Mi Young ;
Kwon, Soonil ;
Baik, Sung Wook .
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (05) :5571-5589
[6]   An extended echo state network using Volterra filtering and principal component analysis [J].
Boccato, Levy ;
Lopes, Amauri ;
Attux, Romis ;
Von Zuben, Fernando J. .
NEURAL NETWORKS, 2012, 32 :292-302
[7]  
Boccato L, 2011, 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), P580, DOI 10.1109/IJCNN.2011.6033273
[8]  
Burkhardt F., 2005, P INT, DOI DOI 10.21437/INTERSPEECH.2005-446
[9]   Recurrent Neural Networks for Multivariate Time Series with Missing Values [J].
Che, Zhengping ;
Purushotham, Sanjay ;
Cho, Kyunghyun ;
Sontag, David ;
Liu, Yan .
SCIENTIFIC REPORTS, 2018, 8
[10]   Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction [J].
Chen, Luefeng ;
Su, Wanjuan ;
Feng, Yu ;
Wu, Min ;
She, Jinhua ;
Hirota, Kaoru .
INFORMATION SCIENCES, 2020, 509 :150-163