CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data

被引:12
|
作者
Hazarika, Subhashis [1 ]
Dutta, Soumya [1 ]
Shen, Han-Wei [1 ]
Chen, Jen-Ping [2 ]
机构
[1] Ohio State Univ, GRAVITY Res Grp, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Mech & Aerosp Engn, Columbus, OH 43210 USA
关键词
In situ processing; Distribution-based; Multivariate; Query-driven; Copula; NONPARAMETRIC MODELS; VISUALIZATION; UNCERTAINTY; VARIABILITY;
D O I
10.1109/TVCG.2018.2864801
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
CoDDA (Copula-based Distribution Driven Analysis) is a flexible framework for large-scale multivariate datasets. A common strategy to deal with large-scale scientific simulation data is to partition the simulation domain and create statistical data summaries. Instead of storing the high-resolution raw data from the simulation, storing the compact statistical data summaries results in reduced storage overhead and alleviated I/O bottleneck. Such summaries, often represented in the form of statistical probability distributions, can serve various post-hoc analysis and visualization tasks. However, for multivariate simulation data using standard multivariate distributions for creating data summaries is not feasible. They are either storage inefficient or are computationally expensive to be estimated in simulation time (in situ) for large number of variables. In this work, using copula functions, we propose a flexible multivariate distribution-based data modeling and analysis framework that offers significant data reduction and can be used in an in situ environment. The framework also facilitates in storing the associated spatial information along with the multivariate distributions in an efficient representation. Using the proposed multivariate data summaries, we perform various multivariate post-hoc analyses like query-driven visualization and sampling-based visualization. We evaluate our proposed method on multiple real-world multivariate scientific datasets. To demonstrate the efficacy of our framework in an in situ environment, we apply it on a large-scale flow simulation.
引用
收藏
页码:1214 / 1224
页数:11
相关论文
共 50 条
  • [21] Visualization Analysis Framework for Large-Scale Software Based on Software Network
    Ren, Shengbing
    Jia, Mengyu
    Huang, Fei
    Liu, Yuan
    DATA SCIENCE, PT 1, 2017, 727 : 751 - 763
  • [22] High-resolution interactive and collaborative data visualization framework for large-scale data analysis
    Su, Simon
    Perry, Vincent
    Cantner, Nicholas
    Kobayashi, Dylan
    Leigh, Jason
    2016 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), 2016, : 275 - 280
  • [23] Application of copula-based approach as a new data-driven model for downscaling the mean daily temperature
    Nazeri Tahroudi, Mohammad
    Ramezani, Yousef
    De Michele, Carlo
    Mirabbasi, Rasoul
    INTERNATIONAL JOURNAL OF CLIMATOLOGY, 2023, 43 (01) : 240 - 254
  • [24] Copula-based analysis of dependent current status data with semiparametric linear transformation model
    Yu, Huazhen
    Zhang, Rui
    Zhang, Lixin
    LIFETIME DATA ANALYSIS, 2024, 30 (04) : 742 - 775
  • [25] RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query
    Mei, Honghui
    Chen, Wei
    Yating, Wei
    Hu, Yuanzhe
    Zhou, Shuyue
    Lin, Bingru
    Ying, Zhao
    Xia, Jiazhi
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (01) : 1161 - 1171
  • [26] Copula-based joint distribution analysis of the ENSO effect on the drought indicators over Borneo fire-prone areas
    Najib, Mohamad Khoirun
    Nurdiati, Sri
    Sopaheluwakan, Ardhasena
    MODELING EARTH SYSTEMS AND ENVIRONMENT, 2022, 8 (02) : 2817 - 2826
  • [27] Copula-based joint distribution analysis of the ENSO effect on the drought indicators over Borneo fire-prone areas
    Mohamad Khoirun Najib
    Sri Nurdiati
    Ardhasena Sopaheluwakan
    Modeling Earth Systems and Environment, 2022, 8 : 2817 - 2826
  • [28] Advanced drought analysis using a novel copula-based multivariate index: a case study of the Ceyhan River Basin
    Terzi, Tolga Baris
    Onoz, Bihrat
    SUSTAINABLE WATER RESOURCES MANAGEMENT, 2025, 11 (01)
  • [29] Asymmetric copula-based distribution models for met-ocean data in offshore wind engineering applications
    Fazeres-Ferradosa, Tiago
    Taveira-Pinto, Francisco
    Vanem, Erik
    Reis, Maria Teresa
    das Neves, Luciana
    WIND ENGINEERING, 2018, 42 (04) : 304 - 334
  • [30] Large-scale distribution modelling and the utility of detailed ground data
    Watson, FGR
    Grayson, RB
    Vertessy, RA
    McMahon, TA
    HYDROLOGICAL PROCESSES, 1998, 12 (06) : 873 - 888