Diagnosing Concept Drift with Visual Analytics

被引:18
作者
Yang, Weikai [1 ]
Li, Zhen [1 ]
Liu, Mengchen [2 ]
Lu, Yafeng [3 ]
Cao, Kelei [1 ]
Maciejewski, Ross [4 ]
Liu, Shixia [1 ]
机构
[1] Tsinghua Univ, Sch Software, BNRist, Beijing, Peoples R China
[2] Microsoft, Redmond, WA USA
[3] Bloomberg LP, New York, NY USA
[4] Arizona State Univ, Comp Sci, Tempe, AZ 85287 USA
来源
2020 IEEE CONFERENCE ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY (VAST 2020) | 2020年
基金
中国国家自然科学基金; 国家重点研发计划; 美国国家科学基金会;
关键词
Concept drift; streaming data; change detection; scatterplot; t-SNE; SHIFT;
D O I
10.1109/VAST50239.2020.00007
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Concept drift is a phenomenon in which the distribution of a data stream changes over time in unforeseen ways, causing prediction models built on historical data to become inaccurate. While a variety of automated methods have been developed to identify when concept drift occurs, there is limited support for analysts who need to understand and correct their models when drift is detected. In this paper, we present a visual analytics method, DriftVis, to support model builders and analysts in the identification and correction of concept drift in streaming data. DriftVis combines a distribution-based drift detection method with a streaming scatterplot to support the analysis of drift caused by the distribution changes of data streams and to explore the impact of these changes on the model's accuracy. A quantitative experiment and two case studies on weather prediction and text classification have been conducted to demonstrate our proposed tool and illustrate how visual analytics can be used to support the detection, examination, and correction of concept drift.
引用
收藏
页码:12 / 23
页数:12
相关论文
共 76 条
  • [1] Akaike H., 1998, Selected Papers of Hirotugu Akaike, P267, DOI DOI 10.1007/978-1-4612-1694-0_15
  • [2] Do Convolutional Neural Networks Learn Class Hierarchy?
    Alsallakh, Bilal
    Jourabloo, Amin
    Ye, Mao
    Liu, Xiaoming
    Ren, Liu
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2018, 24 (01) : 152 - 162
  • [3] ModelTracker: Redesigning Performance Analysis Tools for Machine Learning
    Amershi, Saleema
    Chickering, Max
    Drucker, Steven M.
    Lee, Bongshin
    Simard, Patrice
    Suh, Jina
    [J]. CHI 2015: PROCEEDINGS OF THE 33RD ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2015, : 337 - 346
  • [4] [Anonymous], 2019, IEEE T VIS COMPUT GR, DOI DOI 10.1109/TVCG.2018.2843369
  • [5] Paired Learners for Concept Drift
    Bach, Stephen H.
    Maloof, Marcus A.
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 23 - 32
  • [6] Baena-Garcia M., 2006, P 4 INT WORKSH KNOWL, V6, P77
  • [7] Capacity-Constrained Point Distributions: A Variant of Lloyd's Method
    Balzer, Michael
    Schloemer, Thomas
    Deussen, Oliver
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2009, 28 (03):
  • [8] Bishop Christopher M., 2006, Pattern Recognition and Machine Learning
  • [9] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [10] Analyzing the Noise Robustness of Deep Neural Networks
    Cao, Kelei
    Liu, Mengchen
    Su, Hang
    Wu, Jing
    Zhu, Jun
    Liu, Shixia
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (07) : 3289 - 3304