Husformer: A Multimodal Transformer for Multimodal Human State Recognition

被引：13

作者：

Wang, Ruiqi ^{[1
]}

Jo, Wonse ^{[1
]}

Zhao, Dezhong ^{[2
]}

Wang, Weizheng ^{[1
]}

Gupte, Arjun ^{[1
]}

Yang, Baijian ^{[1
]}

Chen, Guohua ^{[2
]}

Min, Byung-Cheol ^{[1
]}

机构：

[1] Purdue Univ, Dept Comp & Informat Technol, W Lafayette, IN 47907 USA

[2] Beijing Univ Chem Technol, Coll Mech & Elect Engn, Beijing 100029, Peoples R China

来源：

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS | 2024年 / 16卷 / 04期

基金：

美国国家科学基金会;

关键词：

Cognitive load recognition; cross-modal attention; emotion prediction; multimodal fusion; transformer;

D O I：

10.1109/TCDS.2024.3357618

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human state recognition is a critical topic with pervasive and important applications in human-machine systems. Multimodal fusion, which entails integrating metrics from various data sources, has proven to be a potent method for boosting recognition performance. Although recent multimodal-based models have shown promising results, they often fall short in fully leveraging sophisticated fusion strategies essential for modeling adequate cross-modal dependencies in the fusion representation. Instead, they rely on costly and inconsistent feature crafting and alignment. To address this limitation, we propose an end-to-end multimodal transformer framework for multimodal human state recognition called Husformer. Specifically, we propose using cross-modal transformers, which inspire one modality to reinforce itself through directly attending to latent relevance revealed in other modalities, to fuse different modalities while ensuring sufficient awareness of the cross-modal interactions introduced. Subsequently, we utilize a self-attention transformer to further prioritize contextual information in the fusion representation. Extensive experiments on two human emotion corpora (DEAP and WESAD) and two cognitive load datasets [multimodal dataset for objective cognitive workload assessment on simultaneous tasks (MOCAS) and CogLoad] demonstrate that in the recognition of the human state, our Husformer outperforms both state-of-the-art multimodal baselines and the use of a single modality by a large margin, especially when dealing with raw multimodal features. We also conducted an ablation study to show the benefits of each component in Husformer. Experimental details and source code are available at https://github.com/SMARTlab-Purdue/Husformer.

引用

页码：1374 / 1390

页数：17

共 59 条

[1] Detection and Estimation of Cognitive Conflict During Physical Human-Robot Collaboration [J].

Aldini, Stefano ;

Singh, Avinash K. K. ;

Leong, Daniel ;

Wang, Yu-Kai ;

Carmichael, Marc G. G. ;

Liu, Dikai ;

Lin, Chin-Teng .

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (02) :959-968

[2]

Andrew G., 2013, INT C MACHINE LEARNI, V28, P1247

[3]

[Anonymous], 2011, P 28 INT C MACH LEAR

[4]

Bian YC, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P930

[5] Cognitive Workload Estimation Using Variational Autoencoder and Attention-Based Deep Model [J].

Chakladar, Debashis Das ;

Datta, Sumalyo ;

Roy, Partha Pratim ;

Prasad, Vinod A. .

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (02) :581-590

[6] Multimodal Fusion for Objective Assessment of Cognitive Workload: A Review [J].

Debie, Essam ;

Fernandez Rojas, Raul ;

Fidock, Justin ;

Barlow, Michael ;

Kasmarik, Kathryn ;

Anavatti, Sreenatha ;

Garratt, Matt ;

Abbass, Hussein A. .

IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (03) :1542-1555

[7] Datasets for Cognitive Load Inference Using Wearable Sensors and Psychological Traits [J].

Gjoreski, Martin ;

Kolenik, Tine ;

Knez, Timotej ;

Lustrek, Mitja ;

Gams, Matjaz ;

Gjoreski, Hristijan ;

Pejovic, Veljko .

APPLIED SCIENCES-BASEL, 2020, 10 (11)

[8]

Graves A, 2012, STUD COMPUT INTELL, V385, P37

[9]

Haapalainen E, 2010, UBICOMP 2010: PROCEEDINGS OF THE 2010 ACM CONFERENCE ON UBIQUITOUS COMPUTING, P301

[10] Advances in Multimodal Emotion Recognition Based on Brain-Computer Interfaces [J].

He, Zhipeng ;

Li, Zina ;

Yang, Fuzhou ;

Wang, Lei ;

Li, Jingcong ;

Zhou, Chengju ;

Pan, Jiahui .

BRAIN SCIENCES, 2020, 10 (10) :1-29

← 1 2 3 4 5 6 →