A systematic method for solving data imbalance in CRISPR off-target prediction tasks

被引:0
|
作者
Guan Z. [1 ]
Jiang Z. [1 ]
机构
[1] School of Computer Science and Technology, East China Normal University, Shanghai
关键词
CRISPR/Cas9; system; Data imbalance; Off-target prediction;
D O I
10.1016/j.compbiomed.2024.108781
中图分类号
学科分类号
摘要
Accurately identifying potential off-target sites in the CRISPR/Cas9 system is crucial for improving the efficiency and safety of editing. However, the imbalance of available off-target datasets has posed a major obstacle in enhancing prediction performance. Despite several prediction models have been developed to address this issue, there remains a lack of systematic research on handling data imbalance in off-target prediction. This article systematically investigates the data imbalance issue in off-target datasets and explores numerous methods to process data imbalance from a novel perspective. First, we highlight the impact of the imbalance problem on off-target prediction tasks by determining the imbalance ratios present in these datasets. Then, we provide a comprehensive review of various sampling techniques and cost-sensitive methods to mitigate class imbalance in off-target datasets. Finally, systematic experiments are conducted on several state-of-the-art prediction models to illustrate the impact of applying data imbalance solutions. The results show that class imbalance processing methods significantly improve the off-target prediction capabilities of the models across multiple testing datasets. The code and datasets used in this study are available at https://github.com/gzrgzx/CRISPR_Data_Imbalance. © 2024 Elsevier Ltd
引用
收藏
相关论文
共 50 条
  • [1] Data imbalance in CRISPR off-target prediction
    Gao, Yuli
    Chuai, Guohui
    Yu, Weichuan
    Qu, Shen
    Liu, Qi
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (04) : 1448 - 1454
  • [2] A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction
    Yaish, Ofir
    Asif, Maor
    Orenstein, Yaron
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)
  • [3] Massively parallel CRISPR off-target detection enables rapid off-target prediction model building
    Tian, Rui
    Cao, Chen
    He, Dan
    Dong, Dirong
    Sun, Lili
    Liu, Jiashuo
    Chen, Ye
    Wang, Yuyan
    Huang, Zheying
    Li, Lifang
    Jin, Zhuang
    Huang, Zhaoyue
    Xie, Hongxian
    Zhao, Tingting
    Zhong, Chaoyue
    Hong, Yongfeng
    Hu, Zheng
    MED, 2023, 4 (07): : 478 - +
  • [4] DL-CRISPR: A Deep Learning Method for Off-Target Activity Prediction in CRISPR/Cas9 With Data Augmentation
    Zhang, Yu
    Long, Yahui
    Yin, Rui
    Kwoh, Chee Keong
    IEEE ACCESS, 2020, 8 (08): : 76610 - 76617
  • [6] Quantifying CRISPR off-target effects
    Gkazi, Soragia Athina
    EMERGING TOPICS IN LIFE SCIENCES, 2019, 3 (03) : 327 - 334
  • [7] Characterizing CRISPR off-target effects
    Darren J. Burgess
    Nature Reviews Genetics, 2014, 15 (1) : 5 - 5
  • [8] Systematic identification of CRISPR off-target effects by CROss-seq
    Li, Yan
    Zhi, Shengyao
    Wu, Tong
    Chen, Hong-Xuan
    Kang, Rui
    Ma, Dong-Zhao
    Zhou, Songyang
    He, Chuan
    Liang, Puping
    Luo, Guan-Zheng
    PROTEIN & CELL, 2023, 14 (04) : 299 - 303
  • [9] Validation of an In Vitro CRISPR-Cas9 Off-Target Prediction Method in Rhesus Macaques
    AlJanahi, Aisha A.
    Lazzarotto, Cicera
    Yu, Kyung-Rok
    Hong, So Gun
    Chen, Shirley
    Donahue, Robert
    Li, Yuesheng
    Shin, Taehoon
    Tsai, Shengdar
    Dunbar, Cynthia
    MOLECULAR THERAPY, 2018, 26 (05) : 85 - 86
  • [10] Benchmarking and integrating genome-wide CRISPR off-target detection and prediction
    Yan, Jifang
    Xue, Dongyu
    Chuai, Guohui
    Gao, Yuli
    Zhang, Gongchen
    Liu, Qi
    NUCLEIC ACIDS RESEARCH, 2020, 48 (20) : 11370 - 11379