CSI-Based Location-Independent Human Activity Recognition by Contrast Between Dual Stream Fusion Features

被引：0

作者：

Wang, Yujie ^{[1
]}

Yu, Guangwei ^{[2
]}

Zhang, Yong ^{[2
]}

Liu, Dun ^{[2
]}

Zhang, Yang ^{[3
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China

[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230001, Peoples R China

[3] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, England

来源：

IEEE SENSORS JOURNAL | 2025年 / 25卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Contrastive learning; channel state information (CSI); feature fusion; recognition;

D O I：

10.1109/JSEN.2024.3504005

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Due to the fact that channel state information (CSI) data contains activity and environmental information, the features of the same activity vary significantly across different locations. Existing CSI-based human activity recognition (HAR) systems achieve high recognition accuracy at training locations through mechanisms such as transfer learning and few-shot learning when learning new activities. However, they struggle to maintain accurate activity recognition at other locations. In this article, we propose a contrastive fusion feature-based location-independent HAR (CFLH) system to address this issue. Unlike existing methods that simultaneously train feature extractor and fully connected layer classifier, CFLH system decouples the training of the feature extractor and classifier. It only requires obtaining loss through contrastive learning at the feature level to optimize the feature extractor. CFLH system randomly scales activity signals in the temporal dimension to enrich intra and interclass features across different locations, constructing positive samples. Using labels, samples from different activity categories are treated as negative samples to expand interclass feature differences. For more effective activity feature extraction, CFLH system employs a two-tower transformer to extract temporal and channel-stream features. These two features are then fused into a dual-stream fusion feature using an attention and residual-based fusion module (AR-Fusion). Experimental results show that using samples of three activities from 12 points to train the feature extractor, and adding samples of three new activities at training points to train the classifier, the highest recognition accuracy for three new and old activities at the testing location reaches 94.48% and 95.71%, respectively.

引用

页码：4897 / 4907

页数：11