Feedback GAN for DNA optimizes protein functions

被引:125
作者
Gupta, Anvita [1 ]
Zou, James [1 ,2 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Biomed Data Sci, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
DATABASE;
D O I
10.1038/s42256-019-0017-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generative adversarial networks (GANs) represent an attractive and novel approach to generate realistic data, such as genes, proteins or drugs, in synthetic biology. Here, we apply GANs to generate synthetic DNA sequences encoding for proteins of variable length. We propose a novel feedback-loop architecture, feedback GAN (FBGAN), to optimize the synthetic gene sequences for desired properties using an external function analyser. The proposed architecture also has the advantage that the analyser does not need to be differentiable. We apply the feedback-loop mechanism to two examples: generating synthetic genes coding for antimicrobial peptides, and optimizing synthetic genes for the secondary structure of their resulting peptides. A suite of metrics, calculated in silico, demonstrates that the GAN-generated proteins have desirable biophysical properties. The FBGAN architecture can also be used to optimize GAN-generated data points for useful properties in domains beyond genomics. Generative machine learning models are used in synthetic biology to find new structures such as DNA sequences, proteins and other macromolecules with applications in drug discovery, environmental treatment and manufacturing. Gupta and Zou propose and demonstrate in silico a feedback-loop architecture to optimize the output of a generative adversarial network that generates synthetic genes to produce ones specifically coding for antimicrobial peptides.
引用
收藏
页码:105 / 111
页数:7
相关论文
共 26 条
[1]   Alpha Helices Are More Robust to Mutations than Beta Strands [J].
Abrusan, Gyorgy ;
Marsh, Joseph A. .
PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (12)
[2]  
[Anonymous], 2017, PREPRINT, DOI DOI 10.1101/227645V1
[3]  
Apweiler R, 2018, Res., V46, P2699, DOI [10.1093/nar/gky092, DOI 10.1093/NAR/GKY0922-S2.0-85066817340, DOI 10.1093/NAR/GKY092]
[4]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[5]   Synthetic biology [J].
Benner, SA ;
Sismour, AM .
NATURE REVIEWS GENETICS, 2005, 6 (07) :533-543
[6]   Scalable web services for the PSIPRED Protein Analysis Workbench [J].
Buchan, Daniel W. A. ;
Minneci, Federico ;
Nugent, Tim C. O. ;
Bryson, Kevin ;
Jones, David T. .
NUCLEIC ACIDS RESEARCH, 2013, 41 (W1) :W349-W357
[7]  
Esteban C., 2017, PREPRINT
[8]  
Ghahramani A., 2018, PREPRINT, DOI [10.1101/262501V2, DOI 10.1101/262501V2]
[9]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672, DOI DOI 10.1145/3422622
[10]  
Gulrajani I., 2017, PREPRINT