Recognition of CRISPR Off-Target Cleavage Sites with SeqGAN

被引:5
作者
Li, Wen [1 ]
Wang, Xiao-Bo [2 ]
Xu, Yan [1 ]
机构
[1] Univ Sci & Technol Beijing, Inst Comp Technol, Beijing 100083, Peoples R China
[2] Inst Appl Phys & Computat Math, Beijing 100083, Peoples R China
关键词
SeqGAN; CRISPR; off-target; data imbalance; CNN; single-guide RNA;
D O I
10.2174/1574893616666210727162650
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The CRISPR system can quickly achieve the editing of different gene loci by changing a small sequence on a single guide RNA. But the off-target event limits the further development of the CRISPR system. How to improve the efficiency and specificity of this technology and minimize the risk of off-target have always been a challenge. For genome-wide CRISPR Off-Target Cleavage Sites (OTS) prediction, an important issue is data imbalance, that is, the number of true OTS identified is much less than that of all possible nucleotide mismatch loci. Methods: In this work, based on the sequence-generating adversarial network (SeqGAN), positive off target sequences were generated to amplify the off-target gene locus OTS dataset of Cpf1. Then we trained the data by a deep Convolutional Neural Network (CNN) to obtain a predictor with stronger generalization ability and better performance. Results: In 10-fold cross-validation, the AUC value of the CNN classifier after SeqGAN balance was 0.941, which was higher than that of the original 0.863 and over-sampling 0.929. In independence testing, the AUC value of the CNN classifier after SeqGAN balance was 0.841, which was higher than that of the original 0.833 and over-sampling 0.836. The PR value was 0.722 after SeqGAN, which was also about higher 0.16 than the original data and higher about 0.03 than over-sampling. Conclusion: The sequence generation antagonistic network SeqGAN was firstly used to deal with data imbalance processing on CRISPR data. All the results showed that the SeqGAN can effectively generate positive data for CRISPR off-target sites.
引用
收藏
页码:101 / 107
页数:7
相关论文
共 26 条
  • [1] A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action
    Abadi, Shiran
    Yan, Winston X.
    Amar, David
    Mayrose, Itay
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (10)
  • [2] DeepCRISPR: optimized CRISPR guide RNA design by deep learning
    Chuai, Guohui
    Ma, Hanhui
    Yan, Jifang
    Chen, Ming
    Hong, Nanfang
    Xue, Dongyu
    Zhou, Chi
    Zhu, Chenyu
    Chen, Ke
    Duan, Bin
    Gu, Feng
    Qu, Sheng
    Huang, Deshuang
    Wei, Jia
    Liu, Qi
    [J]. GENOME BIOLOGY, 2018, 19
  • [3] The Cpf1 CRISPR-Cas protein expands genome-editing tools
    Fagerlund, Robert D.
    Staals, Raymond H. J.
    Fineran, Peter C.
    [J]. GENOME BIOLOGY, 2015, 16
  • [4] Data imbalance in CRISPR off-target prediction
    Gao, Yuli
    Chuai, Guohui
    Yu, Weichuan
    Qu, Shen
    Liu, Qi
    [J]. BRIEFINGS IN BIOINFORMATICS, 2020, 21 (04) : 1448 - 1454
  • [5] Goodfellow IJ., 2014, GENERATIVE ADVERSA R
  • [6] CRISPR-Cas: biology, mechanisms and relevance
    Hille, Frank
    Charpentier, Emmanuelle
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2016, 371 (1707)
  • [7] A survey of the recent architectures of deep convolutional neural networks
    Khan, Asifullah
    Sohail, Anabia
    Zahoora, Umme
    Qureshi, Aqsa Saeed
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (08) : 5455 - 5516
  • [8] Kim D, 2015, NAT METHODS, V12, P237, DOI [10.1038/nmeth.3284, 10.1038/NMETH.3284]
  • [9] Kim HK, 2017, NAT METHODS, V14, P153, DOI [10.1038/NMETH.4104, 10.1038/nmeth.4104]
  • [10] Off-target predictions in CRISPR-Cas9 gene editing using deep learning
    Lin, Jiecong
    Wong, Ka-Chun
    [J]. BIOINFORMATICS, 2018, 34 (17) : 656 - 663