Synthetic Generation of High-Dimensional Datasets

被引:31
作者
Albuquerque, Georgia [1 ]
Loewe, Thomas [1 ]
Magnor, Marcus [1 ]
机构
[1] TU Braunschweig, Comp Graph Lab, Braunschweig, Germany
关键词
Synthetic data generation; multivariate data; high-dimensional data; interaction;
D O I
10.1109/TVCG.2011.237
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Generation of synthetic datasets is a common practice in many research areas. Such data is often generated to meet specific needs or certain conditions that may not be easily found in the original, real data. The nature of the data varies according to the application area and includes text, graphs, social or weather data, among many others. The common process to create such synthetic datasets is to implement small scripts or programs, restricted to small problems or to a specific application. In this paper we propose a framework designed to generate high dimensional datasets. Users can interactively create and navigate through multi dimensional datasets using a suitable graphical user-interface. The data creation is driven by statistical distributions based on a few user-defined parameters. First, a grounding dataset is created according to given inputs, and then structures and trends are included in selected dimensions and orthogonal projection planes. Furthermore, our framework supports the creation of complex non-orthogonal trends and classified datasets. It can successfully be used to create synthetic datasets simulating important trends as multidimensional clusters, correlations and outliers.
引用
收藏
页码:2317 / 2324
页数:8
相关论文
共 21 条
[11]  
NASH WJ, 1994, POPULATION BIOL ABAL, V1
[12]  
Pargas R. P., 1999, Software Testing, Verification and Reliability, V9, P263, DOI 10.1002/(SICI)1099-1689(199912)9:4<263::AID-STVR190>3.0.CO
[13]  
2-Y
[14]  
Peng W, 2004, IEEE SYMPOSIUM ON INFORMATION VISUALIZATION 2004, PROCEEDINGS, P89
[15]  
Reiter J. P., 2002, J. Off. Statist., V18, P531
[16]  
Saucier Richard., 2000, Computer generation of statistical distributions
[17]   GGobi: evolving from XGobi into an extensible framework for interactive data visualization [J].
Swayne, DF ;
Lang, DT ;
Buja, A ;
Cook, D .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 43 (04) :423-444
[18]  
Tatu Andrada, 2009, Proceedings of the 2009 IEEE Symposium on Visual Analytics Science and Technology. VAST 2009. Held co-jointly with VisWeek 2009, P59, DOI 10.1109/VAST.2009.5332628
[19]  
TUKEY JW, 1985, P 6 ANN C EXP COMP G
[20]  
Ward M. O., 1994, Proceedings. Visualization '94 (Cat. No.94CH35707), P326, DOI 10.1109/VISUAL.1994.346302