High-Volume Hypothesis Testing: Systematic Exploration of Event Sequence Comparisons

被引:32
作者
Malik, Sana [1 ]
Shneiderman, Ben [1 ]
Du, Fan [1 ]
Plaisant, Catherine [1 ]
Bjarnadottir, Margret [2 ]
机构
[1] Univ Maryland, Human Comp Interact Lab, College Pk, MD 20742 USA
[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA
关键词
Cohort comparison; event sequences; visual analytics;
D O I
10.1145/2890478
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cohort comparison studies have traditionally been hypothesis driven and conducted in carefully controlled environments (such as clinical trials). Given two groups of event sequence data, researchers test a single hypothesis (e.g., does the group taking Medication A exhibit more deaths than the group taking Medication B?). Recently, however, researchers have been moving toward more exploratory methods of retrospective analysis with existing data. In this article, we begin by showing that the task of cohort comparison is specific enough to support automatic computation against a bounded set of potential questions and objectives, a method that we refer to as High-Volume Hypothesis Testing (HVHT). From this starting point, we demonstrate that the diversity of these objectives, both across and within different domains, as well as the inherent complexities of real-world datasets, still requires human involvement to determine meaningful insights. We explore how visualization and interaction better support the task of exploratory data analysis and the understanding of HVHT results (how significant they are, why they are meaningful, and whether the entire dataset has been exhaustively explored). Through interviews and case studies with domain experts, we iteratively design and implement visualization and interaction techniques in a visual analytics tool, CoCo. As a result of our evaluation, we propose six design guidelines for enabling users to explore large result sets of HVHT systematically and flexibly in order to glean meaningful insights more quickly. Finally, we illustrate the utility of this method with three case studies in the medical domain.
引用
收藏
页数:23
相关论文
共 39 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]   Methods for evaluation of medication adherence and persistence using automated databases [J].
Andrade, Susan E. ;
Kahler, Kristijan H. ;
Frech, Feride ;
Chan, K. Arnold .
PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2006, 15 (08) :565-574
[3]  
[Anonymous], 2015, C INTELLIGENT USER I, DOI DOI 10.1145/2678025.2701407
[4]  
[Anonymous], 2006, P BELIV 06 TIME ERRO, DOI [DOI 10.1145/1168149, DOI 10.1145/1168149.1168158]
[5]   Detecting group differences: Mining contrast sets [J].
Bay, SD ;
Pazzani, MJ .
DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (03) :213-246
[6]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[7]   Statistics review 12: Survival analysis [J].
Bewick, V ;
Cheek, L ;
Ball, J .
CRITICAL CARE, 2004, 8 (05) :389-394
[8]   Understanding Adherence and Prescription Patterns Using Large-Scale Claims Data [J].
Bjarnadottir, Margret V. ;
Malik, Sana ;
Onukwugha, Eberechukwu ;
Gooden, Tanisha ;
Plaisant, Catherine .
PHARMACOECONOMICS, 2016, 34 (02) :169-179
[9]  
Carter E., 2013, P WORKSH INT SYST HE
[10]  
Collett D., 2003, MODELLING SURVIVAL D