Deep convolutional and conditional neural networks for large-scale genomic data generation

被引:3
|
作者
Yelmen B. [1 ,2 ]
Decelle A. [1 ,3 ]
Boulos L.L. [1 ,4 ]
Szatkownik A. [1 ]
Furtlehner C. [1 ]
Charpiat G. [1 ]
Jay F. [1 ]
机构
[1] Université Paris-Saclay, CNRS, INRIA, LISN, Paris
[2] University of Tartu, Institute of Genomics, Tartu
[3] Universidad Complutense de Madrid, Departamento de Física Teórica, Madrid
[4] Université d’Évry Val-d’Essonne, Évry-Courcouronnes
关键词
Complex networks - Convolution - Data privacy - Deep neural networks - Genes - Large dataset;
D O I
10.1371/journal.pcbi.1011584
中图分类号
学科分类号
摘要
Applications of generative models for genomic data have gained significant momentum in the past few years, with scopes ranging from data characterization to generation of genomic segments and functional sequences. In our previous study, we demonstrated that generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be used to create novel high-quality artificial genomes (AGs) which can preserve the complex characteristics of real genomes such as population structure, linkage disequilibrium and selection signals. However, a major drawback of these models is scalability, since the large feature space of genome-wide data increases computational complexity vastly. To address this issue, we implemented a novel convolutional Wasserstein GAN (WGAN) model along with a novel conditional RBM (CRBM) framework for generating AGs with high SNP number. These networks implicitly learn the varying landscape of haplotypic structure in order to capture complex correlation patterns along the genome and generate a wide diversity of plausible haplotypes. We performed comparative analyses to assess both the quality of these generated haplotypes and the amount of possible privacy leakage from the training data. As the importance of genetic privacy becomes more prevalent, the need for effective privacy protection measures for genomic data increases. We used generative neural networks to create large artificial genome segments which possess many characteristics of real genomes without substantial privacy leakage from the training dataset. In the near future, with further improvements in haplotype quality and privacy preservation, large-scale artificial genome databases can be assembled to provide easily accessible surrogates of real databases, allowing researchers to conduct studies with diverse genomic data within a safe ethical framework in terms of donor privacy. © 2023 Yelmen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
引用
收藏
相关论文
共 50 条
  • [1] Deep Convolutional Neural Networks for Large-scale Speech Tasks
    Sainath, Tara N.
    Kingsbury, Brian
    Saon, George
    Soltau, Hagen
    Mohamed, Abdel-rahman
    Dahl, George
    Ramabhadran, Bhuvana
    NEURAL NETWORKS, 2015, 64 : 39 - 48
  • [2] On the Large-Scale Transferability of Convolutional Neural Networks
    Zheng, Liang
    Zhao, Yali
    Wang, Shengjin
    Wang, Jingdong
    Yang, Yi
    Tian, Qi
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING: PAKDD 2018 WORKSHOPS, 2018, 11154 : 27 - 39
  • [3] Deep PPG: Large-Scale Heart Rate Estimation with Convolutional Neural Networks
    Reiss, Attila
    Indlekofer, Ina
    Schmidt, Philip
    Van Laerhoven, Kristof
    SENSORS, 2019, 19 (14)
  • [4] Large-scale Video Classification with Convolutional Neural Networks
    Karpathy, Andrej
    Toderici, George
    Shetty, Sanketh
    Leung, Thomas
    Sukthankar, Rahul
    Fei-Fei, Li
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1725 - 1732
  • [5] Large-scale analysis of ant foraging dynamics enabled by Deep Convolutional Neural Networks
    Plum, F.
    Labonte, D.
    INTEGRATIVE AND COMPARATIVE BIOLOGY, 2020, 60 : E395 - E395
  • [6] UNSUPERVISED CONVOLUTIONAL NEURAL NETWORKS FOR LARGE-SCALE IMAGE CLUSTERING
    Hsu, Chih-Chung
    Lin, Chia-Wen
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 390 - 394
  • [7] Solving Large-scale Spatial Problems with Convolutional Neural Networks
    Owerko, Damian
    Kanatsoulis, Charilaos I.
    Ribeiro, Alejandro
    FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 1064 - 1069
  • [8] Large-Scale Mapping of Small Roads in Lidar Images Using Deep Convolutional Neural Networks
    Salberg, Arnt-Borre
    Trier, Oivind Due
    Kampffmeyer, Michael
    IMAGE ANALYSIS, SCIA 2017, PT II, 2017, 10270 : 193 - 204
  • [9] Multi-task cascade deep convolutional neural networks for large-scale commodity recognition
    Xiaofeng Zou
    Liqian Zhou
    Kenli Li
    Aijia Ouyang
    Cen Chen
    Neural Computing and Applications, 2020, 32 : 5633 - 5647
  • [10] Multi-task cascade deep convolutional neural networks for large-scale commodity recognition
    Zou, Xiaofeng
    Zhou, Liqian
    Li, Kenli
    Ouyang, Aijia
    Chen, Cen
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (10): : 5633 - 5647