Deep convolutional and conditional neural networks for large-scale genomic data generation

被引:3
|
作者
Yelmen B. [1 ,2 ]
Decelle A. [1 ,3 ]
Boulos L.L. [1 ,4 ]
Szatkownik A. [1 ]
Furtlehner C. [1 ]
Charpiat G. [1 ]
Jay F. [1 ]
机构
[1] Université Paris-Saclay, CNRS, INRIA, LISN, Paris
[2] University of Tartu, Institute of Genomics, Tartu
[3] Universidad Complutense de Madrid, Departamento de Física Teórica, Madrid
[4] Université d’Évry Val-d’Essonne, Évry-Courcouronnes
关键词
Complex networks - Convolution - Data privacy - Deep neural networks - Genes - Large dataset;
D O I
10.1371/journal.pcbi.1011584
中图分类号
学科分类号
摘要
Applications of generative models for genomic data have gained significant momentum in the past few years, with scopes ranging from data characterization to generation of genomic segments and functional sequences. In our previous study, we demonstrated that generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be used to create novel high-quality artificial genomes (AGs) which can preserve the complex characteristics of real genomes such as population structure, linkage disequilibrium and selection signals. However, a major drawback of these models is scalability, since the large feature space of genome-wide data increases computational complexity vastly. To address this issue, we implemented a novel convolutional Wasserstein GAN (WGAN) model along with a novel conditional RBM (CRBM) framework for generating AGs with high SNP number. These networks implicitly learn the varying landscape of haplotypic structure in order to capture complex correlation patterns along the genome and generate a wide diversity of plausible haplotypes. We performed comparative analyses to assess both the quality of these generated haplotypes and the amount of possible privacy leakage from the training data. As the importance of genetic privacy becomes more prevalent, the need for effective privacy protection measures for genomic data increases. We used generative neural networks to create large artificial genome segments which possess many characteristics of real genomes without substantial privacy leakage from the training dataset. In the near future, with further improvements in haplotype quality and privacy preservation, large-scale artificial genome databases can be assembled to provide easily accessible surrogates of real databases, allowing researchers to conduct studies with diverse genomic data within a safe ethical framework in terms of donor privacy. © 2023 Yelmen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
引用
收藏
相关论文
共 50 条
  • [31] Large-Scale Learnable Graph Convolutional Networks
    Gao, Hongyang
    Wang, Zhengyang
    Ji, Shuiwang
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1416 - 1424
  • [32] A deep convolutional neural network based approach for vehicle classification using large-scale GPS trajectory data
    Dabiri, Sina
    Markovic, Nikola
    Heaslip, Kevin
    Reddy, Chandan K.
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2020, 116
  • [33] Machine auscultation: enabling machine diagnostics using convolutional neural networks and large-scale machine audio data
    Yang, Ruo-Yu
    Rai, Rahul
    ADVANCES IN MANUFACTURING, 2019, 7 (02) : 174 - 187
  • [34] Machine auscultation: enabling machine diagnostics using convolutional neural networks and large-scale machine audio data
    Ruo-Yu Yang
    Rahul Rai
    Advances in Manufacturing, 2019, 7 : 174 - 187
  • [35] Forecasting large-scale circulation regimes using deformable convolutional neural networks and global spatiotemporal climate data
    Andreas Holm Nielsen
    Alexandros Iosifidis
    Henrik Karstoft
    Scientific Reports, 12
  • [36] Forecasting large-scale circulation regimes using deformable convolutional neural networks and global spatiotemporal climate data
    Nielsen, Andreas Holm
    Iosifidis, Alexandros
    Karstoft, Henrik
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [37] Signaling in large-scale neural networks
    Berg, Rune W.
    Hounsgaard, Jorn
    COGNITIVE PROCESSING, 2009, 10 : S9 - S15
  • [38] Signaling in large-scale neural networks
    Rune W. Berg
    Jørn Hounsgaard
    Cognitive Processing, 2009, 10 : 9 - 15
  • [39] Using deep convolutional neural networks for multi-classification of thyroid tumor by histopathology: a large-scale pilot study
    Wang, Yunjun
    Guan, Qing
    Lao, Iweng
    Wang, Li
    Wu, Yi
    Li, Duanshu
    Ji, Qinghai
    Wang, Yu
    Zhu, Yongxue
    Lu, Hongtao
    Xiang, Jun
    ANNALS OF TRANSLATIONAL MEDICINE, 2019, 7 (18)
  • [40] Large-Scale Whale Call Classification Using Deep Convolutional Neural Network Architectures
    Wang, Dezhi
    Zhang, Lilun
    Lu, Zengquan
    Xu, Kele
    2018 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2018,