Towards Better Modeling With Missing Data: A Contrastive Learning-Based Visual Analytics Perspective

被引:0
作者
Xie, Laixin [1 ]
Ouyang, Yang [1 ]
Chen, Longfei [1 ]
Wu, Ziming [2 ]
Li, Quan [6 ,1 ]
机构
[1] ShanghaiTech Univ, Shanghai Engn Res Ctr Intelligent Vis & Imaging, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China
[2] Tencent Inc, Shenzhen 518054, Guangdong, Peoples R China
关键词
Data models; Predictive models; Training; Task analysis; Analytical models; Data visualization; Numerical models; Explainable AI; missing data; data imputation; contrastive learning; DIAGNOSIS;
D O I
10.1109/TVCG.2023.3285210
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need for different imputation methods for various missing data mechanisms, heavy dependence on the assumption of data distribution, and potential introduction of bias. This study proposes a Contrastive Learning (CL) framework to model observed data with missing values, where the ML model learns the similarity between an incomplete sample and its complete counterpart and the dissimilarity between other samples. Our proposed approach demonstrates the advantages of CL without requiring any imputation. To enhance interpretability, we introduce CIVis, a visual analytics system that incorporates interpretable techniques to visualize the learning process and diagnose the model status. Users can leverage their domain knowledge through interactive sampling to identify negative and positive pairs in CL. The output of CIVis is an optimized model that takes specified features and predicts downstream tasks. We provide two usage scenarios in regression and classification tasks and conduct quantitative experiments, expert interviews, and a qualitative user study to demonstrate the effectiveness of our approach. In short, this study offers a valuable contribution to addressing the challenges associated with ML modeling in the presence of missing data by providing a practical solution that achieves high predictive accuracy and model interpretability.
引用
收藏
页码:5129 / 5146
页数:18
相关论文
共 60 条
  • [1] A visual analytics approach for the assessment of information quality of performance models-a software review
    Angelini, Marco
    Daraio, Cinzia
    Urban, Luca
    [J]. SCIENTOMETRICS, 2022, 127 (12) : 6827 - 6853
  • [2] Visual Interactive Creation, Customization, and Analysis of Data Quality Metrics
    Bors, Christian
    Gschwandtner, Theresia
    Kriglstein, Simone
    Miksch, Silvia
    Pohl, Margit
    [J]. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2018, 10 (01):
  • [3] A Review of Guidance Approaches in Visual Data Analysis: A Multifocal Perspective
    Ceneda, Davide
    Gschwandtner, Theresia
    Miksch, Silvia
    [J]. COMPUTER GRAPHICS FORUM, 2019, 38 (03) : 861 - 879
  • [4] Chen Ting, 2020, PMLR, P1597, DOI DOI 10.48550/ARXIV.2002.05709
  • [5] Credit card, 2016, About us
  • [6] Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models
    DeRose, Joseph F.
    Wang, Jiayao
    Berger, Matthew
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 1160 - 1170
  • [7] ProS: data series progressive k-NN similarity search and classification with probabilistic quality guarantees
    Echihabi, Karima
    Tsandilas, Theophanis
    Gogolou, Anna
    Bezerianos, Anastasia
    Palpanas, Themis
    [J]. VLDB JOURNAL, 2023, 32 (04) : 763 - 789
  • [8] Feng SY, 2021, Arxiv, DOI arXiv:2105.03075
  • [9] Gillies M., 2016, P 2016 CHI C HUM FAC, P3558, DOI DOI 10.1145/2851581.2856492
  • [10] Gondara L, 2018, Arxiv, DOI arXiv:1705.02737