A Parallel Multi-Modal Factorized Bilinear Pooling Fusion Method Based on the Semi-Tensor Product for Emotion Recognition

被引：3

作者：

Liu, Fen ^{[1
,2
]}

Chen, Jianfeng ^{[1
]}

Li, Kemeng ^{[1
]}

Tan, Weijie ^{[3
]}

Cai, Chang ^{[1
]}

Ayub, Muhammad Saad ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China

[2] Yanan Univ, Coll Math & Comp Sci, Yanan 716000, Peoples R China

[3] Guizhou Univ, Coll Comp Sci & Technol, State Key Lab Publ Big Data, Guiyang 550025, Peoples R China

来源：

ENTROPY | 2022年 / 24卷 / 12期

关键词：

multi-modal information fusion; semi-tensor product; emotion recognition; low-rank matrix; INFORMATION FUSION; REPRESENTATION;

D O I：

10.3390/e24121836

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Multi-modal fusion can exploit complementary information from various modalities and improve the accuracy of prediction or classification tasks. In this paper, we propose a parallel, multi-modal, factorized, bilinear pooling method based on a semi-tensor product (STP) for information fusion in emotion recognition. Initially, we apply the STP to factorize a high-dimensional weight matrix into two low-rank factor matrices without dimension matching constraints. Next, we project the multi-modal features to the low-dimensional matrices and perform multiplication based on the STP to capture the rich interactions between the features. Finally, we utilize an STP-pooling method to reduce the dimensionality to get the final features. This method can achieve the information fusion between modalities of different scales and dimensions and avoids data redundancy due to dimension matching. Experimental verification of the proposed method on the emotion-recognition task using the IEMOCAP and CMU-MOSI datasets showed a significant reduction in storage space and recognition time. The results also validate that the proposed method improves the performance and reduces both the training time and the number of parameters.

引用

页数：14

共 40 条

[1] Multimodal Urban Sound Tagging With Spatiotemporal Context [J].

Bai, Jisheng ;

Chen, Jianfeng ;

Wang, Mou .

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (02) :555-565

[2] Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering [J].

Bai, Zongwen ;

Li, Ying ;

Zhou, Meili ;

Li, Di ;

Wang, Dong ;

Polap, Dawid ;

Wozniak, Marcin .

2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

[3] Multimodal Machine Learning: A Survey and Taxonomy [J].

Baltrusaitis, Tadas ;

Ahuja, Chaitanya ;

Morency, Louis-Philippe .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443

[4] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[5] A Novel Digital Watermarking Based on General Non-Negative Matrix Factorization [J].

Chen, Zigang ;

Li, Lixiang ;

Peng, Haipeng ;

Liu, Yuhong ;

Yang, Yixian .

IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (08) :1973-1986

[6]

Cheng D., 2001, Sci. China Ser. Inf. Sci., P195

[7]

Cheng D., 2012, An Introduction to Semi-tensor Product of Matrices and Its Applications

[8] A Linear Representation of Dynamics of Boolean Networks [J].

Cheng, Daizhan ;

Qi, Hongsheng .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2010, 55 (10) :2251-2258

[9]

Degottex G, 2014, INT CONF ACOUST SPEE, DOI 10.1109/ICASSP.2014.6853739

[10]

Fu W, 2018, INT GEOSCI REMOTE SE, P2737, DOI 10.1109/IGARSS.2018.8519360

← 1 2 3 4 →