Deep convolutional and conditional neural networks for large-scale genomic data generation

被引:3
|
作者
Yelmen B. [1 ,2 ]
Decelle A. [1 ,3 ]
Boulos L.L. [1 ,4 ]
Szatkownik A. [1 ]
Furtlehner C. [1 ]
Charpiat G. [1 ]
Jay F. [1 ]
机构
[1] Université Paris-Saclay, CNRS, INRIA, LISN, Paris
[2] University of Tartu, Institute of Genomics, Tartu
[3] Universidad Complutense de Madrid, Departamento de Física Teórica, Madrid
[4] Université d’Évry Val-d’Essonne, Évry-Courcouronnes
关键词
Complex networks - Convolution - Data privacy - Deep neural networks - Genes - Large dataset;
D O I
10.1371/journal.pcbi.1011584
中图分类号
学科分类号
摘要
Applications of generative models for genomic data have gained significant momentum in the past few years, with scopes ranging from data characterization to generation of genomic segments and functional sequences. In our previous study, we demonstrated that generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be used to create novel high-quality artificial genomes (AGs) which can preserve the complex characteristics of real genomes such as population structure, linkage disequilibrium and selection signals. However, a major drawback of these models is scalability, since the large feature space of genome-wide data increases computational complexity vastly. To address this issue, we implemented a novel convolutional Wasserstein GAN (WGAN) model along with a novel conditional RBM (CRBM) framework for generating AGs with high SNP number. These networks implicitly learn the varying landscape of haplotypic structure in order to capture complex correlation patterns along the genome and generate a wide diversity of plausible haplotypes. We performed comparative analyses to assess both the quality of these generated haplotypes and the amount of possible privacy leakage from the training data. As the importance of genetic privacy becomes more prevalent, the need for effective privacy protection measures for genomic data increases. We used generative neural networks to create large artificial genome segments which possess many characteristics of real genomes without substantial privacy leakage from the training dataset. In the near future, with further improvements in haplotype quality and privacy preservation, large-scale artificial genome databases can be assembled to provide easily accessible surrogates of real databases, allowing researchers to conduct studies with diverse genomic data within a safe ethical framework in terms of donor privacy. © 2023 Yelmen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
引用
收藏
相关论文
共 50 条
  • [21] Efficient Inference of Large-Scale and Lightweight Convolutional Neural Networks on FPGA
    Wu, Xiao
    Ma, Yufei
    Wang, Zhongfeng
    2020 IEEE 33RD INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (SOCC), 2020, : 168 - 173
  • [22] Classifying the large-scale structure of the universe with deep neural networks
    Aragon-Calvo, M. A.
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2019, 484 (04) : 5771 - 5784
  • [23] Exercise Motion Classification from Large-Scale Wearable Sensor Data Using Convolutional Neural Networks
    Um, Terry Taewoong
    Babakeshizadeh, Vahid
    Kulic, Dana
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 2385 - 2390
  • [24] Large-Scale Stochastic Scene Generation and Semantic Annotation for Deep Convolutional Neural Network Training in the RoboCup SPL
    Hess, Timm
    Mundt, Martin
    Weis, Tobias
    Ramesh, Visvanathan
    ROBOCUP 2017: ROBOT WORLD CUP XXI, 2018, 11175 : 33 - 44
  • [25] Classification of large-scale stellar spectra based on deep convolutional neural network
    Liu, W.
    Zhu, M.
    Dai, C.
    He, D. Y.
    Yao, Jiawen
    Tian, H. F.
    Wang, B. Y.
    Wu, K.
    Zhan, Y.
    Chen, B. -Q.
    Luo, A-Li
    Wang, R.
    Cao, Y.
    Yu, X. C.
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2019, 483 (04) : 4774 - 4783
  • [26] Fully convolutional neural networks applied to large-scale marine morphology mapping
    Arosio, Riccardo
    Hobley, Brandon
    Wheeler, Andrew J.
    Sacchetti, Fabio
    Conti, Luis A.
    Furey, Thomas
    Lim, Aaron
    FRONTIERS IN MARINE SCIENCE, 2023, 10
  • [27] Large-scale Multimodal Gesture Segmentation and Recognition based on Convolutional Neural Networks
    Wang, Huogen
    Wang, Pichao
    Song, Zhanjie
    Li, Wanqing
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 3138 - 3146
  • [28] Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification
    Maggiori, Emmanuel
    Tarabalka, Yuliya
    Charpiat, Guillaume
    Alliez, Pierre
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (02): : 645 - 657
  • [29] Application of convolutional neural networks to large-scale naphtha pyrolysis kinetic modeling
    Feng Hua
    Zhou Fang
    Tong Qiu
    ChineseJournalofChemicalEngineering, 2018, 26 (12) : 2562 - 2572
  • [30] Application of convolutional neural networks to large-scale naphtha pyrolysis kinetic modeling
    Hua, Feng
    Fang, Zhou
    Qiu, Tong
    CHINESE JOURNAL OF CHEMICAL ENGINEERING, 2018, 26 (12) : 2562 - 2572