Cosmo: Contrastive Fusion Learning with Small Data for Multimodal Human Activity Recognition

被引:33
作者
Ouyang, Xiaomin [1 ]
Shuai, Xian [1 ]
Zhou, Jiayu [2 ]
Shi, Ivy Wang [3 ]
Xie, Zhiyuan [1 ]
Xing, Guoliang [1 ]
Huang, Jianwei [4 ,5 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Michigan State Univ, E Lansing, MI USA
[3] Li Chun United World Coll, Hong Kong, Peoples R China
[4] Chinese Univ Hong Kong, Shenzhen, Peoples R China
[5] Shenzhen Inst Artificial Intelligence & Robot Soc, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE 2022 THE 28TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, ACM MOBICOM 2022 | 2022年
关键词
Human activity recognition; Heterogeneous multimodal fusion; Contrastive learning;
D O I
10.1145/3495243.3560519
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Human activity recognition (HAR) is a key enabling technology for a wide range of emerging applications. Although multimodal sensing systems are essential for capturing complex and dynamic human activities in real-world settings, they bring several new challenges including limited labeled multimodal data. In this paper, we propose Cosmo, a new system for contrastive fusion learning with small data in multimodal HAR applications. Cosmo features a novel two-stage training strategy that leverages both unlabeled data on the cloud and limited labeled data on the edge. By integrating novel fusion-based contrastive learning and quality-guided attention mechanisms, Cosmo can effectively extract both consistent and complementary information across different modalities for efficient fusion. Our evaluation on a cloud-edge testbed using two public datasets and a new multimodal HAR dataset shows that Cosmo delivers significant improvement over state-of-the-art baselines in both recognition accuracy and convergence delay.
引用
收藏
页码:324 / 337
页数:14
相关论文
共 61 条
  • [1] Ok Google, What Am I Doing? Acoustic Activity Recognition Bounded by Conversational Assistant Interactions
    Adaimi, Rebecca
    Yong, Howard
    Thomaz, Edison
    [J]. PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2021, 5 (01):
  • [2] Alwassel H., 2020, ADV NEURAL INFORM PR, V33
  • [3] [Anonymous], 2017, ALZHEIMER'S DIGITAL BIOMARKERS
  • [4] [Anonymous], 2022, NVDIA JETSON TX2
  • [5] [Anonymous], 2013, P 11 ACM C EMB NETW
  • [6] Bi Chongguang, 2017, Proc IEEE Int Conf Pervasive Comput Commun, V2017, P21, DOI 10.1109/PERCOM.2017.7917847
  • [7] Boroushaki Tara, 2021, SenSys '21: Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, P192, DOI 10.1145/3485730.3485944
  • [8] Cai D., 2010, P 16 ACM SIGKDD INT, P333
  • [9] Teaching RF to Sense without RF Training Measurements
    Cai, Hong
    Korany, Belal
    Karanam, Chitra R.
    Mostofi, Yasamin
    [J]. PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2020, 4 (04):
  • [10] Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
    Chen, Brian
    Rouditchenko, Andrew
    Duarte, Kevin
    Kuehne, Hilde
    Thomas, Samuel
    Boggust, Angie
    Panda, Rameswar
    Kingsbury, Brian
    Feris, Rogerio
    Harwath, David
    Glass, James
    Picheny, Michael
    Chang, Shih-Fu
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7992 - 8001