Multi-agent Transformer Networks for Multimodal Human Activity Recognition

被引：4

作者：

Li, Jingcheng ^{[1
]}

Yao, Lina ^{[2
]}

Li, Binghao ^{[1
]}

Wang, Xianzhi ^{[3
]}

Sammut, Claude ^{[1
]}

机构：

[1] Univ New South Wales, Sydney, NSW, Australia

[2] Univ New South Wales, CSIRO, Data 61, Sydney, NSW, Australia

[3] Univ Technol Sydney, Sydney, NSW, Australia

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022 | 2022年

关键词：

Activity recognition; neural networks; multi-agent reinforcement learning; multimodal learning; MODELS;

D O I：

10.1145/3511808.3557402

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Human activity recognition has become an important challenge yet to resolve while also having promising benefits in various applications for years. Existing approaches have made great progress by applying deep learning and attention-based methods. However, the deep learning-based approaches may not fully exploit the features to resolve multimodal human activity recognition tasks. Also, the potential of attention-based methods still has not been fully explored to better extract the multimodal spatial-temporal relationship and produce robust results. In this work, we propose Multi-agent Transformer Network (MATN), a multi-agent attention-based deep learning algorithm, to address the above issues in multimodal human activity recognition. We first design a unified representation learning layer to encode the multimodal data, which preprocesses the data in a generalized and efficient way. Then we develop a multimodal spatial-temporal transformer module that applies the attention mechanism to extract the salient spatial-temporal features. Finally, we use a multi-agent training module to collaboratively select the informative modalities and predict the activity labels. We have extensively conducted experiments to evaluate MATN's performance on two public multimodal human activity recognition datasets. The results show that our model has achieved competitive performance compared to the state-of-the-art approaches, which also demonstrates scalability, effectiveness, and robustness.

引用

页码：1135 / 1145

页数：11

共 54 条

[1] Bilinear Spatiotemporal Basis Models [J].

Akhter, Ijaz ;

Simon, Tomas ;

Khan, Sohaib ;

Matthews, Iain ;

Sheikh, Yaser .

ACM TRANSACTIONS ON GRAPHICS, 2012, 31 (02) :1-12

[2] Adversarial Multi-view Networks for Activity Recognition [J].

Bai, Lei ;

Yao, Lina ;

Wang, Xianzhi ;

Kanhere, Salil S. ;

Bin Guo ;

Yu, Zhiwen .

PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2020, 4 (02)

[3] Prototype Similarity Learning for Activity Recognition [J].

Bai, Lei ;

Yao, Lina ;

Wang, Xianzhi ;

Kanhere, Salil S. ;

Xiao, Yang .

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT I, 2020, 12084 :649-661

[4] Multimodal Machine Learning: A Survey and Taxonomy [J].

Baltrusaitis, Tadas ;

Ahuja, Chaitanya ;

Morency, Louis-Philippe .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443

[5] Improving Deep Learning for HAR with Shallow LSTMs [J].

Bock, Marius ;

Hoelzemann, Alexander ;

Moeller, Michael ;

Van Laerhoven, Kristof .

IWSC'21: PROCEEDINGS OF THE 2021 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, 2021, :7-12

[6] Style machines [J].

Brand, M ;

Hertzmann, A .

SIGGRAPH 2000 CONFERENCE PROCEEDINGS, 2000, :183-192

[7] A Tutorial on Human Activity Recognition Using Body-Worn Inertial Sensors [J].

Bulling, Andreas ;

Blanke, Ulf ;

Schiele, Bernt .

ACM COMPUTING SURVEYS, 2014, 46 (03)

[8]

Chathuramali KGM, 2012, INT CONF ADV ICT, P197, DOI 10.1109/ICTer.2012.6421415

[9]

Chen C, 2015, IEEE IMAGE PROC, P168, DOI 10.1109/ICIP.2015.7350781

[10]

Chen KX, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1344

← 1 2 3 4 5 6 →