Selection Bias Tracking and Detailed Subset Comparison for High-Dimensional Data

被引:14
|
作者
Borland, David [1 ]
Wang, Wenyuan [2 ]
Zhang, Jonathan [3 ]
Shrestha, Joshua [4 ]
Gotz, David [2 ]
机构
[1] Univ N Carolina, RENCI, Chapel Hill, NC 27515 USA
[2] Univ N Carolina, Sch Informat & Lib Sci, Chapel Hill, NC 27515 USA
[3] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27515 USA
[4] Univ N Carolina, Dept Comp Sci, Chapel Hill, NC 27515 USA
基金
美国国家科学基金会;
关键词
High-dimensional visualization; visual analytics; cohort selection; medical informatics; selection bias; VISUAL ANALYTICS; ADJUST; VISUALIZATION;
D O I
10.1109/TVCG.2019.2934209
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The collection of large, complex datasets has become common across a wide variety of domains. Visual analytics tools increasingly play a key role in exploring and answering complex questions about these large datasets. However, many visualizations are not designed to concurrently visualize the large number of dimensions present in complex datasets (e.g. tens of thousands of distinct codes in an electronic health record system). This fact, combined with the ability of many visual analytics systems to enable rapid, ad-hoc specification of groups, or cohorts, of individuals based on a small subset of visualized dimensions, leads to the possibility of introducing selection when the user creates a cohort based on a specified set of dimensions, differences across many other unseen dimensions may also be introduced. These unintended side effects may result in the cohort no longer being representative of the larger population intended to be studied, which can negatively affect the validity of subsequent analyses. We present techniques for selection bias tracking and visualization that can be incorporated into high-dimensional exploratory visual analytics systems, with a focus on medical data with existing data hierarchies. These techniques include: (1) tree-based cohort provenance and visualization, including a user-specified baseline cohort that all other cohorts are compared against, and visual encoding of cohort, which indicates where selection bias may have occurred, and (2) a set of visualizations, including a novel icicle-plot based visualization, to compare in detail the per-dimension differences between the baseline and a user-specified focus cohort. These techniques are integrated into a medical temporal event sequence visual analytics tool. We present example use cases and report findings from domain expert user interviews.
引用
收藏
页码:429 / 439
页数:11
相关论文
共 50 条
  • [1] Dynamic Hierarchical Aggregation, Selection Bias Tracking, and Detailed Subset Comparison for High-Dimensional Event Sequence Data
    Zhang, Jonathan
    Borland, David
    Wang, Wenyuan
    Shrestha, Joshua
    Gotz, David
    2019 IEEE WORKSHOP ON VISUAL ANALYTICS IN HEALTHCARE (VAHC), 2019, : 56 - 57
  • [2] Adaptive Contextualization: Combating Bias During High-Dimensional Visualization and Data Selection
    Gotz, David
    Sun, Shun
    Cao, Nan
    PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES (IUI'16), 2016, : 85 - 95
  • [3] Adaptive Contextualization Methods for Combating Selection Bias during High-Dimensional Visualization
    Gotz, David
    Sun, Shun
    Cao, Nan
    Kundu, Rita
    Meyer, Anne-Marie
    ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2017, 7 (04)
  • [4] DecisionFlow: Visual Analytics for High-Dimensional Temporal Event Sequence Data
    Gotz, David
    Stavropoulos, Harry
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2014, 20 (12) : 1783 - 1792
  • [5] Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis
    Turkay, Cagatay
    Kaya, Erdem
    Balcisoy, Selim
    Hauser, Helwig
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2017, 23 (01) : 131 - 140
  • [6] Observation-Level and Parametric Interaction for High-Dimensional Data Analysis
    Self, Jessica Zeitz
    Dowling, Michelle
    Wenskovitch, John
    Crandell, Ian
    Wang, Ming
    House, Leanna
    Leman, Scotland
    North, Chris
    ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2018, 8 (02)
  • [7] High-Dimensional Scientific Data Exploration via Cinema
    Woodring, Jonathan
    Ahrens, James P.
    Patchett, John
    Tauxe, Cameron
    Rogers, David H.
    2017 IEEE WORKSHOP ON DATA SYSTEMS FOR INTERACTIVE ANALYSIS (DSIA), 2017,
  • [8] Visualization and data mining of high-dimensional data
    Inselberg, A
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2002, 60 (1-2) : 147 - 159
  • [9] Fast Insight into High-Dimensional Parametrized Simulation Data
    Butnaru, Daniel
    Peherstorfer, Benjamin
    Bungartz, Hans-Joachim
    Pflueger, Dirk
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 265 - 270
  • [10] ClusterSculptor: A visual analytics tool for high-dimensional data
    Nam, Eun Ju
    Han, Yiping
    Mueller, Klaus
    Zelenyuk, Alla
    Lmre, Dan
    VAST: IEEE SYMPOSIUM ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY 2007, PROCEEDINGS, 2007, : 75 - +