Prediction of Off-Target Effects in CRISPR/Cas9 System by Ensemble Learning

被引:4
作者
Fan, Yongxian [1 ]
Xu, Haibo [1 ]
机构
[1] Guilin Univ Elect Technol, Sch Comp Sci & Informat Secur, Guilin, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
CRISPR/Cas9; off-target effects; machine learning; ensemblelearning; XGBoost; XGBCRISPR; CRISPR-CAS9; NUCLEASES; RNA; DNA; SPECIFICITY; CLEAVAGE; DESIGN; MODEL; SEQ; SINGLE; SITES;
D O I
10.2174/1574893616666210811100938
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: CRISPR/Cas9, a new generation of targeted gene editing technology with low cost and simple operation has been widely employed in the field of gene editing. The erroneous cutting of off-target sites in CRISPR/Cas9 is called off-target effect, which is also the biggest complication that CRISPR/Cas9 confronts in practical application. To be specific, the off-target effects could lead to unexpected gene editing results. Therefore, accurately predicting CRISPR/Cas9 off-target effect is a very important task. Predicting off-target effects of CRISPR/Cas9 by machine learning method is feasible, but most existing off-target tools did not pay close attention to the effects of gene encoding on prediction. Methods: We compared three encoding methods based on One-Hot and combined the gene sequence with four CRISPR/Cas9 off-target prediction tools to build an ensemble model with XGBoost, designated as XGBCRISPR. The grid search is employed to find the optimal parameters to achieve the best performance. Results: The performance is compared with existing tools based on the ROC value and PRC value. The experimental results show that the XGBCRISPR model is superior to the existing tools. Conclusion: The new model could achieve better prediction result than existing tools, but the accuracy of model can be improved further as many off-target scores appear.
引用
收藏
页码:1169 / 1178
页数:10
相关论文
共 54 条
[1]   A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action [J].
Abadi, Shiran ;
Yan, Winston X. ;
Amar, David ;
Mayrose, Itay .
PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (10)
[2]   Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases [J].
Bae, Sangsu ;
Park, Jeongbin ;
Kim, Jin-Soo .
BIOINFORMATICS, 2014, 30 (10) :1473-1475
[3]  
Bishop C.M., 1995, J. Am. Stat. Assoc, V92, P482
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[6]  
Cameron P, 2017, NAT METHODS, V14, P600, DOI [10.1038/NMETH.4284, 10.1038/nmeth.4284]
[7]   sgRNA Scorer 2.0: A Species-Independent Model To Predict CRISPR/Cas9 Activity [J].
Chari, Raj ;
Yeo, Nan Cher ;
Chavez, Alejandro ;
Church, George M. .
ACS SYNTHETIC BIOLOGY, 2017, 6 (05) :902-904
[8]  
Chen Tianqi., 2016, KDD 16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P785
[9]   Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases [J].
Cho, Seung Woo ;
Kim, Sojung ;
Kim, Yongsub ;
Kweon, Jiyeon ;
Kim, Heon Seok ;
Bae, Sangsu ;
Kim, Jin-Soo .
GENOME RESEARCH, 2014, 24 (01) :132-141
[10]   DeepCRISPR: optimized CRISPR guide RNA design by deep learning [J].
Chuai, Guohui ;
Ma, Hanhui ;
Yan, Jifang ;
Chen, Ming ;
Hong, Nanfang ;
Xue, Dongyu ;
Zhou, Chi ;
Zhu, Chenyu ;
Chen, Ke ;
Duan, Bin ;
Gu, Feng ;
Qu, Sheng ;
Huang, Deshuang ;
Wei, Jia ;
Liu, Qi .
GENOME BIOLOGY, 2018, 19