CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data

被引:12
|
作者
Hazarika, Subhashis [1 ]
Dutta, Soumya [1 ]
Shen, Han-Wei [1 ]
Chen, Jen-Ping [2 ]
机构
[1] Ohio State Univ, GRAVITY Res Grp, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Mech & Aerosp Engn, Columbus, OH 43210 USA
关键词
In situ processing; Distribution-based; Multivariate; Query-driven; Copula; NONPARAMETRIC MODELS; VISUALIZATION; UNCERTAINTY; VARIABILITY;
D O I
10.1109/TVCG.2018.2864801
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
CoDDA (Copula-based Distribution Driven Analysis) is a flexible framework for large-scale multivariate datasets. A common strategy to deal with large-scale scientific simulation data is to partition the simulation domain and create statistical data summaries. Instead of storing the high-resolution raw data from the simulation, storing the compact statistical data summaries results in reduced storage overhead and alleviated I/O bottleneck. Such summaries, often represented in the form of statistical probability distributions, can serve various post-hoc analysis and visualization tasks. However, for multivariate simulation data using standard multivariate distributions for creating data summaries is not feasible. They are either storage inefficient or are computationally expensive to be estimated in simulation time (in situ) for large number of variables. In this work, using copula functions, we propose a flexible multivariate distribution-based data modeling and analysis framework that offers significant data reduction and can be used in an in situ environment. The framework also facilitates in storing the associated spatial information along with the multivariate distributions in an efficient representation. Using the proposed multivariate data summaries, we perform various multivariate post-hoc analyses like query-driven visualization and sampling-based visualization. We evaluate our proposed method on multiple real-world multivariate scientific datasets. To demonstrate the efficacy of our framework in an in situ environment, we apply it on a large-scale flow simulation.
引用
收藏
页码:1214 / 1224
页数:11
相关论文
共 50 条
  • [41] Distribution-based Particle Data Reduction for In-situ Analysis and Visualization of Large-scale N-body Cosmological Simulations
    Li, Guan
    Xu, Jiayi
    Zhang, Tianchi
    Shan, Guihua
    Shen, Han-Wei
    Wang, Ko-Chih
    Liao, Shihong
    Lu, Zhonghua
    2020 IEEE PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS), 2020, : 171 - 180
  • [42] Visual Analysis of Large-Scale Protein-Ligand Interaction Data
    Schatz, Karsten
    Franco-Moreno, Juan Jose
    Schafer, Marco
    Rose, Alexander S.
    Ferrario, Valerio
    Pleiss, Jurgen
    Vazquez, Pere-Pau
    Ertl, Thomas
    Krone, Michael
    COMPUTER GRAPHICS FORUM, 2021, 40 (06) : 394 - 408
  • [43] Large-scale, realistic cloud visualization based on weather forecast data
    Hufnagel, Roland
    Held, Martin
    Schroeder, Florian
    PROCEEDINGS OF THE NINTH IASTED INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS AND IMAGING, 2007, : 54 - 59
  • [44] Progressive Tree-Based Compression of Large-Scale Particle Data
    Hoang, Duong
    Bhatia, Harsh
    Lindstrom, Peter
    Pascucci, Valerio
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (07) : 4321 - 4338
  • [45] Efficient and Portable Distribution Modeling for Large-Scale Scientific Data Processing with Data-Parallel Primitives
    Yang, Hao-Yi
    Lin, Zhi-Rong
    Wang, Ko-Chih
    ALGORITHMS, 2021, 14 (10)
  • [46] A Multivariate Frequency Analysis Framework to Estimate the Return Period of Hurricane Events Using Event-Based Copula
    Cho, Eunsaem
    Ahmadisharaf, Ebrahim
    Done, James
    Yoo, Chulsang
    WATER RESOURCES RESEARCH, 2023, 59 (12)
  • [47] Data-driven robust optimization in the face of large-scale datasets: An incremental learning approach
    Asgari, Somayeh Danesh
    Mohammadi, Emran
    Makui, Ahmad
    Jafari, Mostafa
    JOURNAL OF COMPUTATIONAL SCIENCE, 2024, 83
  • [48] SCANPY: large-scale single-cell gene expression data analysis
    F. Alexander Wolf
    Philipp Angerer
    Fabian J. Theis
    Genome Biology, 19
  • [49] A Statistical Framework for Neuroimaging Data Analysis Based on Mutual Information Estimated via a Gaussian Copula
    Ince, Robin A. A.
    Giordano, Bruno L.
    Kayser, Christoph
    Rousselet, Guillaume A.
    Gross, Joachim
    Schyns, Philippe G.
    HUMAN BRAIN MAPPING, 2017, 38 (03) : 1541 - 1573
  • [50] Efficient processing and analysis of large-scale light-sheet microscopy data
    Amat, Fernando
    Hoeckendorf, Burkhard
    Wan, Yinan
    Lemon, William C.
    McDole, Katie
    Keller, Philipp J.
    NATURE PROTOCOLS, 2015, 10 (11) : 1679 - 1696