Identifying Label Noise in Time-Series Datasets

被引：11

作者：

Atkinson, Gentry ^{[1
]}

Metsis, Vangelis ^{[1
]}

机构：

[1] Texas State Univ, San Marcos, TX 78666 USA

来源：

UBICOMP/ISWC '20 ADJUNCT: PROCEEDINGS OF THE 2020 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2020 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS | 2020年

关键词：

Label cleaning; neural networks; time-series data; CNN; accelerometer; human activity recognition; label noise;

D O I：

10.1145/3410530.3414366

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reliably labeled datasets are crucial to the performance of supervised learning methods. Time-series data pose additional challenges. Data points lying on borders between classes can be mislabeled due to perception limitations of human labelers. Sensor measurements may not be directly interpretable by humans. Thus label noise cannot be manually removed. As a result, time-series datasets often contain a significant amount of label noise that can degrade the performance of machine learning models. This work focuses on label noise identification and removal by extending previous methods developed for static instances to the domain of time-series data. We use a combination of deep learning and visualization algorithms to facilitate automatic noise removal. We show that our approach can identify mislabeled instances, which results in improved classification accuracy on four synthetic and two real publicly available human activity datasets.

引用

页码：238 / 243

页数：6

共 15 条

[1] Identifying mislabeled training data
Brodley, CE
Friedl, MA
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1999, 11 : 131 - 167
[2] Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python']Python package)
Christ, Maximilian
Braun, Nils
Neuffer, Julius
Kempa-Liehr, Andreas W.
[J]. NEUROCOMPUTING, 2018, 307 : 72 - 77
[3] Frenay B., 2014, EUR S ART NEUR NETW
[4] Classification in the Presence of Label Noise: a Survey
Frenay, Benoit
Verleysen, Michel
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (05) : 845 - 869
[5] The University of Sussex-Huawei Locomotion and Transportation Dataset for Multimodal Analytics With Mobile Devices
Gjoreski, Hristijan
Ciliberto, Mathias
Wang, Lin
Morales, Francisco Javier Ordonez
Mekki, Sami
Valentin, Stefan
Roggen, Daniel
[J]. IEEE ACCESS, 2018, 6 : 42592 - 42604
[6] A Survey of mislabeled training data detection techniques for pattern classification
Guan, Donghai
Yuan, Weiwei
[J]. IETE TECHNICAL REVIEW, 2013, 30 (06) : 524 - 530
[7] Kadous MW, 1999, MACHINE LEARNING, PROCEEDINGS, P454
[8] Handling Annotation Uncertainty in Human Activity Recognition
Kwon, Hyeokhyen
Abowd, Gregory D.
Plotz, Thomas
[J]. ISWC'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, 2019, : 109 - 117
[9] Gradient-based learning applied to document recognition
Lecun, Y
Bottou, L
Bengio, Y
Haffner, P
[J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
[10] Lee SM, 2017, INT CONF BIG DATA, P131, DOI 10.1109/BIGCOMP.2017.7881728

← 1 2 →