Selection Bias Tracking and Detailed Subset Comparison for High-Dimensional Data

被引:14
|
作者
Borland, David [1 ]
Wang, Wenyuan [2 ]
Zhang, Jonathan [3 ]
Shrestha, Joshua [4 ]
Gotz, David [2 ]
机构
[1] Univ N Carolina, RENCI, Chapel Hill, NC 27515 USA
[2] Univ N Carolina, Sch Informat & Lib Sci, Chapel Hill, NC 27515 USA
[3] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27515 USA
[4] Univ N Carolina, Dept Comp Sci, Chapel Hill, NC 27515 USA
基金
美国国家科学基金会;
关键词
High-dimensional visualization; visual analytics; cohort selection; medical informatics; selection bias; VISUAL ANALYTICS; ADJUST; VISUALIZATION;
D O I
10.1109/TVCG.2019.2934209
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The collection of large, complex datasets has become common across a wide variety of domains. Visual analytics tools increasingly play a key role in exploring and answering complex questions about these large datasets. However, many visualizations are not designed to concurrently visualize the large number of dimensions present in complex datasets (e.g. tens of thousands of distinct codes in an electronic health record system). This fact, combined with the ability of many visual analytics systems to enable rapid, ad-hoc specification of groups, or cohorts, of individuals based on a small subset of visualized dimensions, leads to the possibility of introducing selection when the user creates a cohort based on a specified set of dimensions, differences across many other unseen dimensions may also be introduced. These unintended side effects may result in the cohort no longer being representative of the larger population intended to be studied, which can negatively affect the validity of subsequent analyses. We present techniques for selection bias tracking and visualization that can be incorporated into high-dimensional exploratory visual analytics systems, with a focus on medical data with existing data hierarchies. These techniques include: (1) tree-based cohort provenance and visualization, including a user-specified baseline cohort that all other cohorts are compared against, and visual encoding of cohort, which indicates where selection bias may have occurred, and (2) a set of visualizations, including a novel icicle-plot based visualization, to compare in detail the per-dimension differences between the baseline and a user-specified focus cohort. These techniques are integrated into a medical temporal event sequence visual analytics tool. We present example use cases and report findings from domain expert user interviews.
引用
收藏
页码:429 / 439
页数:11
相关论文
共 50 条
  • [31] Decision Tree Visualization for High-dimensional Numerical Data
    Szuecs, Dora
    Schmidt, Florian
    2018 FIFTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2018, : 190 - 195
  • [32] invis: Exploring High-Dimensional RNA Sequences from In Vitro Selection
    Demiralp, Cagatay
    Hayden, Eric
    Hammerbacher, Jeff
    Heer, Jeffrey
    2013 IEEE SYMPOSIUM ON BIOLOGICAL DATA VISUALIZATION (BIOVIS), 2013, : 1 - 8
  • [33] Inverse probability weighting is an effective method to address selection bias during the analysis of high dimensional data
    Carry, Patrick M.
    Vanderlinden, Lauren A.
    Dong, Fran
    Buckner, Teresa
    Litkowski, Elizabeth
    Vigers, Timothy
    Norris, Jill M.
    Kechris, Katerina
    GENETIC EPIDEMIOLOGY, 2021, 45 (06) : 593 - 603
  • [34] TopoMap: A 0-dimensional Homology Preserving Projection of High-Dimensional Data
    Doraiswamy, Harish
    Tierny, Julien
    Silva, Paulo J. S.
    Nonato, Luis Gustavo
    Silva, Claudio
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 561 - 571
  • [35] Interactive, Graph-Based Visual Analysis of High-Dimensional, Multi-Parameter Fluorescence Microscopy Data in Toponomics
    Oeltze, Steffen
    Freiler, Wolfgang
    Hillert, Reyk
    Doleisch, Helmut
    Preim, Bernhard
    Schubert, Walter
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (12) : 1882 - 1891
  • [36] Hierarchy-based projection of high-dimensional labeled data to reduce visual clutter
    Herr, Dominik
    Han, Qi
    Lohmann, Steffen
    Ertl, Thomas
    COMPUTERS & GRAPHICS-UK, 2017, 62 : 28 - 40
  • [37] RBPCP: Visualization on Multi-set High-dimensional Data
    Xie, Weiqiang
    Wei, Yingmei
    Ma, Hao
    Du, Xiaolei
    2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 16 - 20
  • [38] Integrated Dual Analysis of Quantitative and Qualitative High-Dimensional Data
    Muller, Juliane
    Garrison, Laura
    Ulbrich, Philipp
    Schreiber, Stefanie
    Bruckner, Stefan
    Hauser, Helwig
    Oeltze-Jafra, Steffen
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (06) : 2953 - 2966
  • [39] Viewpoints: A High-Performance High-Dimensional Exploratory Data Analysis Tool
    Gazis, P. R.
    Levit, C.
    Way, M. J.
    PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC, 2010, 122 (898) : 1518 - 1525
  • [40] Cluster Appearance Glyphs: A Methodology for Illustrating High-Dimensional Data Patterns in 2-D Data Layouts
    Lee, Jenny Hyunjung
    Coelho, Darius
    Mueller, Klaus
    INFORMATION, 2022, 13 (01)