A Framework for Large-Scale Synthetic Graph Dataset Generation

被引:0
作者
Darabi, Sajad [1 ]
Bigaj, Piotr [1 ]
Majchrowski, Dawid [1 ]
Kasymov, Artur [1 ,2 ]
Morkisz, Pawel [1 ,3 ]
Fit-Florea, Alex [1 ]
机构
[1] NVIDIA, Santa Clara, CA 95050 USA
[2] Jagiellonian Univ, Fac Math & Comp Sci, Doctoral Sch Exact & Nat Sci, PL-31007 Krakow, Poland
[3] AGH Univ Krakow, Fac Appl Math, PL-30059 Krakow, Poland
关键词
Generators; Mathematical models; Biological system modeling; Deep learning; Iron; Complexity theory; Synthetic data; Stochastic processes; Social networking (online); Learning systems; Big data applications; graph neural networks (GNNs); machine learning; synthetic data; NETWORKS;
D O I
10.1109/TNNLS.2025.3540392
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, there has been increasing interest in developing and deploying deep graph learning algorithms for various tasks, such as fraud detection and recommender systems. However, there is a limited number of publicly available graph-structured datasets, most of which are small compared with production-sized applications or limited in their application domain. In this work, we tackle this shortcoming by proposing a synthetic graph generation tool that enables scaling datasets to production-size graphs with trillions of edges and billions of nodes. The proposed method comprises a series of parametric models that can either be randomly initialized or fit to proprietary datasets. These models can then be released to researchers to study graph methods on the synthetic data, facilitating prototype development and novel applications. We demonstrate the generalizability of the framework across various datasets, mimicking their structural and feature distributions, as well as the ability to scale them to varying sizes, demonstrating their usefulness for benchmarking and model development. Code can be found on GitHub.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Reproducing Reaction Mechanisms with Machine-Learning Models Trained on a Large-Scale Mechanistic Dataset
    Joung, Joonyoung F.
    Fong, Mun Hong
    Roh, Jihye
    Tu, Zhengkai
    Bradshaw, John
    Coley, Connor W.
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2024, 63 (43)
  • [42] BertLoc: Duplicate Location Record Detection in a Large-Scale Location Dataset
    Park, Sujin
    Lee, Sangwon
    Woo, Simon S.
    36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 942 - 951
  • [43] Decoupling graph convolutional networks for large-scale supervised classification
    Koreneva, Mariia
    Visheratin, Alexander A.
    Nasonov, Denis
    9TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE IN COMPUTATIONAL SCIENCE, YSC2020, 2020, 178 : 337 - 344
  • [44] A new framework for very large-scale urban modelling
    Batty, Michael
    Milton, Richard
    URBAN STUDIES, 2021, 58 (15) : 3071 - 3094
  • [45] Discovering Mobile Application Usage Patterns from a Large-Scale Dataset
    Silva, Fabricio A.
    Domingues, Augusto C. S. A.
    Braga Silva, Thais R. M.
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (05)
  • [46] A large-scale remote sensing scene dataset construction for semantic segmentation
    Xu, LeiLei
    Shi, ShanQiu
    Liu, YuJun
    Zhang, Hao
    Wang, Dan
    Zhang, Lu
    Liang, Wan
    Chen, Hao
    INTERNATIONAL JOURNAL OF IMAGE AND DATA FUSION, 2023, 14 (04) : 299 - 323
  • [47] A Framework for the Revision of Large-Scale Image Retrieval Benchmarks
    Hassan, Muhammad Umair
    Shohag, Md Shakil Ahamed
    Niu, Dongmei
    Shaukat, Kamran
    Zhang, Mingxuan
    Zhao, Wenshuang
    Zhao, Xiuyang
    ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2019), 2019, 11179
  • [48] Generation of synthetic dataset to improve deep learning models for pavement distress assessment
    Ghosh, Rohit
    Yamany, Mohamed S.
    Smadi, Omar
    INNOVATIVE INFRASTRUCTURE SOLUTIONS, 2025, 10 (01)
  • [49] Implementation of a Large-Scale Image Curation Workflow Using Deep Learning Framework
    Domalpally, Amitha
    Slater, Robert
    Barrett, Nancy
    Voland, Rick
    Balaji, Rohit
    Heathcote, Jennifer
    Channa, Roomasa
    Blodi, Barbara
    OPHTHALMOLOGY SCIENCE, 2022, 2 (04):
  • [50] A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking
    Duan, Keyu
    Liu, Zirui
    Wang, Peihao
    Zheng, Wenqing
    Zhou, Kaixiong
    Chen, Tianlong
    Hu, Xia
    Wang, Zhangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,