ProcessGAN: Generating Privacy-Preserving Time-Aware Process Data with Conditional Generative Adversarial Nets

被引:0
作者
Li, Keyi [1 ]
Yang, Sen [2 ]
Sullivan, Travis m. [3 ]
Burd, Randall s. [3 ]
Marsic, Ivan [1 ]
机构
[1] Rutgers State Univ, Elect & Comp Engn Dept, New Brunswick, NJ 08901 USA
[2] Waymo, Mountain View, CA USA
[3] Childrens Natl Hosp, Washington, DC USA
基金
美国国家卫生研究院;
关键词
Synthetic data generation; Process mining; Sequential data; Generative adversarial networks; Data privacy; Time aware;
D O I
10.1145/3687464
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Process data constructed from event logs provides valuable insights into procedural dynamics over time. The confidential information in process data, together with the data's intricate nature, makes the datasets not sharable and challenging to collect. Consequently, research is limited using process data and analytics in the process mining domain. In this study, we introduced a synthetic process data generation task to address the limitation of sharable process data. We introduced a generative adversarial network, called ProcessGAN, to generate process data with activity sequences and corresponding timestamps. ProcessGAN consists of a transformer-based network as the generator, and a time-aware self-attention network as the discriminator. It can generate privacy-preserving process data from random noise. ProcessGAN considers the duration of the process and time intervals between activities to generate realistic activity sequences with timestamps. We evaluated ProcessGAN on five real-world datasets, two that are public and three collected in medical domains that are private. To evaluate the synthetic data, in addition to statistical metrics, we trained a supervised model to score the synthetic processes. We also used process mining to discover workflows for synthetic medical processes and had domain experts evaluate the clinical applicability of the synthetic workflows. ProcessGAN outperformed the existing generative models in generating complex processes with valid parallel pathways. The synthetic process data generated by ProcessGAN better represented the long-range dependencies between activities, a feature relevant to complicated medical and other processes. The timestamps generated by the ProcessGAN model showed similar distributions with the authentic timestamps. In addition, we trained a transformer-based network to generate synthetic contexts (e.g., patient demographics) that were associated with the synthetic processes. The synthetic contexts generated by our model outperformed the baseline models, with the distributions similar to the authentic contexts. We conclude that ProcessGAN can generate sharable synthetic process data indistinguishable from authentic data. Our source code is available in https://github.com/raaachli/ProcessGAN.
引用
收藏
页数:31
相关论文
共 50 条
[1]  
Bengio S, 2015, ADV NEUR IN, V28
[2]  
Briscoe Jarren, 2022, P AAAI SPRING S DES
[3]  
Bukhsh ZA, 2021, Arxiv, DOI arXiv:2104.00721
[4]  
Cairns A.H., 2015, International Journal on Advances in Intelligent Systems, V8, P219
[5]   Process-oriented Iterative Multiple Alignment for Medical Process Mining [J].
Chen, Shuhong ;
Yang, Sen ;
Zhou, Moliang ;
Burd, Randall S. ;
Marsic, Ivan .
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017), 2017, :438-445
[6]   SynSys: A Synthetic Data Generation System for Healthcare Applications [J].
Dahmen, Jessamyn ;
Cook, Diane .
SENSORS, 2019, 19 (05)
[7]   Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning [J].
Engelmann, Justin ;
Lessmann, Stefan .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
[8]   Continuous-Time Sequential Recommendation with Temporal Graph Collaborative Transformer [J].
Fan, Ziwei ;
Liu, Zhiwei ;
Zhang, Jiawei ;
Xiong, Yun ;
Zheng, Lei ;
Yu, Philip S. .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, :433-442
[9]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[10]  
Gozalo-Brizuela R, 2023, arXiv, DOI [10.48550/arXiv.2301.04655, DOI 10.48550/ARXIV.2301.04655]