A Novel Predictor for the Analysis and Prediction of Enhancers and Their Strength via Multi-View Features and Deep Forest

被引:5
作者
Gill, Mehwish [1 ]
Ahmed, Saeed [1 ]
Kabir, Muhammad [1 ,2 ]
Hayat, Maqsood [3 ]
机构
[1] Univ Management & Technol, Sch Syst & Technol, Lahore 54770, Pakistan
[2] Lund Univ, Biomed Ctr, B11, S-22184 Lund, Sweden
[3] Abdul Wali Khan Univ, Dept Comp Sci, Mardan 23200, Pakistan
关键词
enhancers; deep learning; deep forest; bioinformatics; feature representation; learning algorithms; classification; sequence-based models; DNA N-4-METHYLCYTOSINE SITES; SEQUENCE-BASED PREDICTOR; NEURAL-NETWORK; ACCURATE PREDICTION; BINDING-SITES; IDENTIFICATION; SELECTION; PROTEINS; MODEL;
D O I
10.3390/info14120636
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Enhancers are short DNA segments (50-1500 bp) that effectively activate gene transcription when transcription factors (TFs) are present. There is a correlation between the genetic differences in enhancers and numerous human disorders including cancer and inflammatory bowel disease. In computational biology, the accurate categorization of enhancers can yield important information for drug discovery and development. High-throughput experimental approaches are thought to be vital tools for researching enhancers' key characteristics; however, because these techniques require a lot of labor and time, it might be difficult for researchers to forecast enhancers and their powers. Therefore, computational techniques are considered an alternate strategy for handling this issue. Based on the types of algorithms that have been used to construct predictors, the current methodologies can be divided into three primary categories: ensemble-based methods, deep learning-based approaches, and traditional ML-based techniques. In this study, we developed a novel two-layer deep forest-based predictor for accurate enhancer and strength prediction, namely, NEPERS. Enhancers and non-enhancers are divided at the first level by NEPERS, whereas strong and weak enhancers are divided at the second level. To evaluate the effectiveness of feature fusion, block-wise deep forest and other algorithms were combined with multi-view features such as PSTNPss, PSTNPdss, CKSNAP, and NCP via 10-fold cross-validation and independent testing. Our proposed technique performs better than competing models across all parameters, with an ACC of 0.876, Sen of 0.864, Spe of 0.888, MCC of 0.753, and AUC of 0.940 for layer 1 and an ACC of 0.959, Sen of 0.960, Spe of 0.958, MCC of 0.918, and AUC of 0.990 for layer 2, respectively, for the benchmark dataset. Similarly, for the independent test, the ACC, Sen, Spe, MCC, and AUC were 0.863, 0.865, 0.860, 0.725, and 0.948 for layer 1 and 0.890, 0.940, 0.840, 0.784, and 0.951 for layer 2, respectively. This study provides conclusive insights for the accurate and effective detection and characterization of enhancers and their strengths.
引用
收藏
页数:16
相关论文
共 46 条
  • [11] Ensemble methods in machine learning
    Dietterich, TG
    [J]. MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 : 1 - 15
  • [12] A Few Useful Things to Know About Machine Learning
    Domingos, Pedro
    [J]. COMMUNICATIONS OF THE ACM, 2012, 55 (10) : 78 - 87
  • [13] BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data
    Guo, Yang
    Liu, Shuhui
    Li, Zhanhuai
    Shang, Xuequn
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [14] Guyon I., 2003, Journal of Machine Learning Research, V3, P1157, DOI 10.1162/153244303322753616
  • [15] 70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features
    He, Wenying
    Jia, Cangzhi
    Duan, Yucong
    Zou, Quan
    [J]. BMC SYSTEMS BIOLOGY, 2018, 12
  • [16] Enhancers and chromatin structures: regulatory hubs in gene expression and diseases
    Hu, Zhenhua
    Tee, Wee-Wei
    [J]. BIOSCIENCE REPORTS, 2017, 37
  • [17] iEnhancer-DHF: Identification of Enhancers and Their Strengths Using Optimize Deep Neural Network With Multiple Features Extraction Methods
    Inayat, Nagina
    Khan, Mukhtaj
    Iqbal, Nadeem
    Khan, Salman
    Raza, Mushtaq
    Khan, Dost Muhammad
    Khan, Abbas
    Wei, Dong Qing
    [J]. IEEE ACCESS, 2021, 9 : 40783 - 40796
  • [18] PASSION: an ensemble neural network approach for identifying the binding sites of RBPs on circRNAs
    Jia, Cangzhi
    Bi, Yue
    Chen, Jinxiang
    Leier, Andre
    Li, Fuyi
    Song, Jiangning
    [J]. BIOINFORMATICS, 2020, 36 (15) : 4276 - 4282
  • [19] EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features
    Jia, Cangzhi
    He, Wenying
    [J]. SCIENTIFIC REPORTS, 2016, 6
  • [20] A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information
    Le, Nguyen Quoc Khanh
    Ho, Quang-Thai
    Nguyen, Trinh-Trung-Duong
    Ou, Yu-Yen
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)