Identifying Label Noise in Time-Series Datasets

被引:11
作者
Atkinson, Gentry [1 ]
Metsis, Vangelis [1 ]
机构
[1] Texas State Univ, San Marcos, TX 78666 USA
来源
UBICOMP/ISWC '20 ADJUNCT: PROCEEDINGS OF THE 2020 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2020 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS | 2020年
关键词
Label cleaning; neural networks; time-series data; CNN; accelerometer; human activity recognition; label noise;
D O I
10.1145/3410530.3414366
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Reliably labeled datasets are crucial to the performance of supervised learning methods. Time-series data pose additional challenges. Data points lying on borders between classes can be mislabeled due to perception limitations of human labelers. Sensor measurements may not be directly interpretable by humans. Thus label noise cannot be manually removed. As a result, time-series datasets often contain a significant amount of label noise that can degrade the performance of machine learning models. This work focuses on label noise identification and removal by extending previous methods developed for static instances to the domain of time-series data. We use a combination of deep learning and visualization algorithms to facilitate automatic noise removal. We show that our approach can identify mislabeled instances, which results in improved classification accuracy on four synthetic and two real publicly available human activity datasets.
引用
收藏
页码:238 / 243
页数:6
相关论文
共 15 条
  • [1] Identifying mislabeled training data
    Brodley, CE
    Friedl, MA
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1999, 11 : 131 - 167
  • [2] Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python']Python package)
    Christ, Maximilian
    Braun, Nils
    Neuffer, Julius
    Kempa-Liehr, Andreas W.
    [J]. NEUROCOMPUTING, 2018, 307 : 72 - 77
  • [3] Frenay B., 2014, EUR S ART NEUR NETW
  • [4] Classification in the Presence of Label Noise: a Survey
    Frenay, Benoit
    Verleysen, Michel
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (05) : 845 - 869
  • [5] The University of Sussex-Huawei Locomotion and Transportation Dataset for Multimodal Analytics With Mobile Devices
    Gjoreski, Hristijan
    Ciliberto, Mathias
    Wang, Lin
    Morales, Francisco Javier Ordonez
    Mekki, Sami
    Valentin, Stefan
    Roggen, Daniel
    [J]. IEEE ACCESS, 2018, 6 : 42592 - 42604
  • [6] A Survey of mislabeled training data detection techniques for pattern classification
    Guan, Donghai
    Yuan, Weiwei
    [J]. IETE TECHNICAL REVIEW, 2013, 30 (06) : 524 - 530
  • [7] Kadous MW, 1999, MACHINE LEARNING, PROCEEDINGS, P454
  • [8] Handling Annotation Uncertainty in Human Activity Recognition
    Kwon, Hyeokhyen
    Abowd, Gregory D.
    Plotz, Thomas
    [J]. ISWC'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, 2019, : 109 - 117
  • [9] Gradient-based learning applied to document recognition
    Lecun, Y
    Bottou, L
    Bengio, Y
    Haffner, P
    [J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
  • [10] Lee SM, 2017, INT CONF BIG DATA, P131, DOI 10.1109/BIGCOMP.2017.7881728