Cosmo: Contrastive Fusion Learning with Small Data for Multimodal Human Activity Recognition

被引：33

作者：

Ouyang, Xiaomin ^{[1
]}

Shuai, Xian ^{[1
]}

Zhou, Jiayu ^{[2
]}

Shi, Ivy Wang ^{[3
]}

Xie, Zhiyuan ^{[1
]}

Xing, Guoliang ^{[1
]}

Huang, Jianwei ^{[4
,5
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] Michigan State Univ, E Lansing, MI USA

[3] Li Chun United World Coll, Hong Kong, Peoples R China

[4] Chinese Univ Hong Kong, Shenzhen, Peoples R China

[5] Shenzhen Inst Artificial Intelligence & Robot Soc, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE 2022 THE 28TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, ACM MOBICOM 2022 | 2022年

关键词：

Human activity recognition; Heterogeneous multimodal fusion; Contrastive learning;

D O I：

10.1145/3495243.3560519

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Human activity recognition (HAR) is a key enabling technology for a wide range of emerging applications. Although multimodal sensing systems are essential for capturing complex and dynamic human activities in real-world settings, they bring several new challenges including limited labeled multimodal data. In this paper, we propose Cosmo, a new system for contrastive fusion learning with small data in multimodal HAR applications. Cosmo features a novel two-stage training strategy that leverages both unlabeled data on the cloud and limited labeled data on the edge. By integrating novel fusion-based contrastive learning and quality-guided attention mechanisms, Cosmo can effectively extract both consistent and complementary information across different modalities for efficient fusion. Our evaluation on a cloud-edge testbed using two public datasets and a new multimodal HAR dataset shows that Cosmo delivers significant improvement over state-of-the-art baselines in both recognition accuracy and convergence delay.

引用

页码：324 / 337

页数：14

共 61 条

[1] Ok Google, What Am I Doing? Acoustic Activity Recognition Bounded by Conversational Assistant Interactions
Adaimi, Rebecca
Yong, Howard
Thomaz, Edison
[J]. PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2021, 5 (01):
[2] Alwassel H., 2020, ADV NEURAL INFORM PR, V33
[3] [Anonymous], 2017, ALZHEIMER'S DIGITAL BIOMARKERS
[4] [Anonymous], 2022, NVDIA JETSON TX2
[5] [Anonymous], 2013, P 11 ACM C EMB NETW
[6] Bi Chongguang, 2017, Proc IEEE Int Conf Pervasive Comput Commun, V2017, P21, DOI 10.1109/PERCOM.2017.7917847
[7] Boroushaki Tara, 2021, SenSys '21: Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, P192, DOI 10.1145/3485730.3485944
[8] Cai D., 2010, P 16 ACM SIGKDD INT, P333
[9] Teaching RF to Sense without RF Training Measurements
Cai, Hong
Korany, Belal
Karanam, Chitra R.
Mostofi, Yasamin
[J]. PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2020, 4 (04):
[10] Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Chen, Brian
Rouditchenko, Andrew
Duarte, Kevin
Kuehne, Hilde
Thomas, Samuel
Boggust, Angie
Panda, Rameswar
Kingsbury, Brian
Feris, Rogerio
Harwath, David
Glass, James
Picheny, Michael
Chang, Shih-Fu
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7992 - 8001

← 1 2 3 4 5 6 7 →