VIBR: Visualizing Bipartite Relations at Scale with the Minimum Description Length Principle

被引:14
作者
Chan, Gromit Yeuk-Yin [1 ]
Xu, Panpan [2 ]
Dai, Zeng [2 ]
Ren, Liu [2 ]
机构
[1] NYU, New York, NY 10003 USA
[2] Bosch Res North Amer, Sunnyvale, CA USA
关键词
Bipartite Graph; Visual Summarization; Minimum Description Length; Information Theory; ANALYTICS; ALGORITHM; MATRICES; DESIGN;
D O I
10.1109/TVCG.2018.2864826
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bipartite graphs model the key relations in many large scale real-world data: customers purchasing items, legislators voting for bills, people's affiliation with different social groups, faults occurring in vehicles, etc. However, it is challenging to visualize large scale bipartite graphs with tens of thousands or even more nodes or edges. In this paper, we propose a novel visual summarization technique for bipartite graphs based on the minimum description length (MDL) principle. The method simultaneously groups the two different set of nodes and constructs aggregated bipartite relations with balanced granularity and precision. It addresses the key trade-off that often occurs for visualizing large scale and noisy data: acquiring a clear and uncluttered overview while maximizing the information content in it. We formulate the visual summarization task as a co-clustering problem and propose an efficient algorithm based on locality sensitive hashing (LSH) that can easily scale to large graphs under reasonable interactive time constraints that previous related methods cannot satisfy. The method leads to the opportunity of introducing a visual analytics framework with multiple levels-of-detail to facilitate interactive data exploration. In the framework, we also introduce a compact visual design inspired by adjacency list representation of graphs as the building block for a small multiples display to compare the bipartite relations for different subsets of data. We showcase the applicability and effectiveness of our approach by applying it on synthetic data with ground truth and performing case studies on real-world datasets from two application domains including roll-call vote record analysis and vehicle fault pattern analysis. Interviews with experts in the political science community and the automotive industry further highlight the benefits of our approach.
引用
收藏
页码:321 / 330
页数:10
相关论文
共 47 条
[1]  
Alsallakh B, 2014, EUR C VIS EUROVIS ST, P1, DOI DOI 10.2312/EUROVISSTAR.20141170
[2]  
[Anonymous], 2012, INFORM VISUAL
[3]  
[Anonymous], 2004, KDD
[4]  
[Anonymous], 2017, ARXIV171010777
[5]  
Bertin J., 2011, SEMIOLOGY GRAPHICS D
[6]   FacetAtlas: Multifaceted Visualization for Rich Text Corpora [J].
Cao, Nan ;
Sun, Jimeng ;
Lin, Yu-Ru ;
Gotz, David ;
Liu, Shixia ;
Qu, Huamin .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2010, 16 (06) :1172-1181
[7]   Legislative Voting Behavior, Seen and Unseen: A Theory of Roll-Call Vote Selection [J].
Carrubba, Clifford ;
Gabel, Matthew ;
Hug, Simon .
LEGISLATIVE STUDIES QUARTERLY, 2008, 33 (04) :543-572
[8]   Querying and Exploring Polygamous Relationships in Urban Spatio-Temporal Data Sets [J].
Chan, Yeuk-Yin ;
Chirigati, Fernando ;
Doraiswamy, Harish ;
Silva, Claudio T. ;
Freire, Juliana .
SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, :1643-1646
[9]   Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets [J].
Chirigati, Fernando ;
Doraiswamy, Harish ;
Damoulas, Theodoros ;
Freire, Juliana .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :1011-1025
[10]  
Dhillon I. S., 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P269, DOI 10.1145/502512.502550