CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data

被引:12
作者
Hazarika, Subhashis [1 ]
Dutta, Soumya [1 ]
Shen, Han-Wei [1 ]
Chen, Jen-Ping [2 ]
机构
[1] Ohio State Univ, GRAVITY Res Grp, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Mech & Aerosp Engn, Columbus, OH 43210 USA
关键词
In situ processing; Distribution-based; Multivariate; Query-driven; Copula; NONPARAMETRIC MODELS; VISUALIZATION; UNCERTAINTY; VARIABILITY;
D O I
10.1109/TVCG.2018.2864801
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
CoDDA (Copula-based Distribution Driven Analysis) is a flexible framework for large-scale multivariate datasets. A common strategy to deal with large-scale scientific simulation data is to partition the simulation domain and create statistical data summaries. Instead of storing the high-resolution raw data from the simulation, storing the compact statistical data summaries results in reduced storage overhead and alleviated I/O bottleneck. Such summaries, often represented in the form of statistical probability distributions, can serve various post-hoc analysis and visualization tasks. However, for multivariate simulation data using standard multivariate distributions for creating data summaries is not feasible. They are either storage inefficient or are computationally expensive to be estimated in simulation time (in situ) for large number of variables. In this work, using copula functions, we propose a flexible multivariate distribution-based data modeling and analysis framework that offers significant data reduction and can be used in an in situ environment. The framework also facilitates in storing the associated spatial information along with the multivariate distributions in an efficient representation. Using the proposed multivariate data summaries, we perform various multivariate post-hoc analyses like query-driven visualization and sampling-based visualization. We evaluate our proposed method on multiple real-world multivariate scientific datasets. To demonstrate the efficacy of our framework in an in situ environment, we apply it on a large-scale flow simulation.
引用
收藏
页码:1214 / 1224
页数:11
相关论文
共 50 条
  • [31] Robust Operation of Flexible Distribution Network With Large-Scale EV Charging Loads
    Zhao, Jinli
    Qu, Jiahui
    Ji, Haoran
    Xu, Jing
    Hasanien, Hany M.
    Turky, Rania A.
    Li, Peng
    IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2024, 10 (01): : 2207 - 2219
  • [32] Large-Scale Data Analysis Using Heuristic Methods
    Dzemyda, Gintautas
    Sakalauskas, Leonidas
    INFORMATICA, 2011, 22 (01) : 1 - 10
  • [33] In Situ Data-Driven Adaptive Sampling for Large-scale Simulation Data Summarization
    Biswas, Ayan
    Dutta, Soumya
    Pulido, Jesus
    Ahrens, James
    PROCEEDINGS OF IN SITU INFRASTRUCTURES FOR ENABLING EXTREME-SCALE ANALYSIS AND VISUALIZATION (ISAV 2018), 2018, : 13 - 18
  • [34] A Vine Copula-Based Polynomial Chaos Framework for Improving Multi-Model Hydroclimatic Projections at a Multi-Decadal Convection-Permitting Scale
    Zhang, Boen
    Wang, Shuo
    Qing, Yamin
    Zhu, Jinxin
    Wang, Dagang
    Liu, Jiafeng
    WATER RESOURCES RESEARCH, 2022, 58 (06)
  • [35] Interactive Data Mining for Large-Scale Image Databases Based on Formal Concept Analysis
    Tanabata, Takanari
    Sawase, Kazuhito
    Nobuhara, Hajime
    Bede, Barnabas
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2010, 14 (03) : 303 - 308
  • [36] Copula-based multivariate renewal model for life-cycle analysis of civil infrastructure considering multiple dependent deterioration processes
    Li, Yaohan
    Dong, You
    Guo, Hongyuan
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2023, 231
  • [37] An open source analysis framework for large-scale building energy modeling
    Ball, Brian L.
    Long, Nicholas
    Fleming, Katherine
    Balbach, Chris
    Lopez, Phylroy
    JOURNAL OF BUILDING PERFORMANCE SIMULATION, 2020, 13 (05) : 487 - 500
  • [38] Data-driven robust optimization for the itinerary planning via large-scale GPS data
    Wu, Lei
    Hifi, Mhand
    KNOWLEDGE-BASED SYSTEMS, 2021, 231
  • [39] ESRGAN-based visualization for large-scale volume data
    Chenyue Jiao
    Chongke Bi
    Lu Yang
    Zhen Wang
    Zijun Xia
    Kenji Ono
    Journal of Visualization, 2023, 26 : 649 - 665
  • [40] ESRGAN-based visualization for large-scale volume data
    Jiao, Chenyue
    Bi, Chongke
    Yang, Lu
    Wang, Zhen
    Xia, Zijun
    Ono, Kenji
    JOURNAL OF VISUALIZATION, 2023, 26 (03) : 649 - 665