A Novel Predictor for the Analysis and Prediction of Enhancers and Their Strength via Multi-View Features and Deep Forest

被引:5
作者
Gill, Mehwish [1 ]
Ahmed, Saeed [1 ]
Kabir, Muhammad [1 ,2 ]
Hayat, Maqsood [3 ]
机构
[1] Univ Management & Technol, Sch Syst & Technol, Lahore 54770, Pakistan
[2] Lund Univ, Biomed Ctr, B11, S-22184 Lund, Sweden
[3] Abdul Wali Khan Univ, Dept Comp Sci, Mardan 23200, Pakistan
关键词
enhancers; deep learning; deep forest; bioinformatics; feature representation; learning algorithms; classification; sequence-based models; DNA N-4-METHYLCYTOSINE SITES; SEQUENCE-BASED PREDICTOR; NEURAL-NETWORK; ACCURATE PREDICTION; BINDING-SITES; IDENTIFICATION; SELECTION; PROTEINS; MODEL;
D O I
10.3390/info14120636
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Enhancers are short DNA segments (50-1500 bp) that effectively activate gene transcription when transcription factors (TFs) are present. There is a correlation between the genetic differences in enhancers and numerous human disorders including cancer and inflammatory bowel disease. In computational biology, the accurate categorization of enhancers can yield important information for drug discovery and development. High-throughput experimental approaches are thought to be vital tools for researching enhancers' key characteristics; however, because these techniques require a lot of labor and time, it might be difficult for researchers to forecast enhancers and their powers. Therefore, computational techniques are considered an alternate strategy for handling this issue. Based on the types of algorithms that have been used to construct predictors, the current methodologies can be divided into three primary categories: ensemble-based methods, deep learning-based approaches, and traditional ML-based techniques. In this study, we developed a novel two-layer deep forest-based predictor for accurate enhancer and strength prediction, namely, NEPERS. Enhancers and non-enhancers are divided at the first level by NEPERS, whereas strong and weak enhancers are divided at the second level. To evaluate the effectiveness of feature fusion, block-wise deep forest and other algorithms were combined with multi-view features such as PSTNPss, PSTNPdss, CKSNAP, and NCP via 10-fold cross-validation and independent testing. Our proposed technique performs better than competing models across all parameters, with an ACC of 0.876, Sen of 0.864, Spe of 0.888, MCC of 0.753, and AUC of 0.940 for layer 1 and an ACC of 0.959, Sen of 0.960, Spe of 0.958, MCC of 0.918, and AUC of 0.990 for layer 2, respectively, for the benchmark dataset. Similarly, for the independent test, the ACC, Sen, Spe, MCC, and AUC were 0.863, 0.865, 0.860, 0.725, and 0.948 for layer 1 and 0.890, 0.940, 0.840, 0.784, and 0.951 for layer 2, respectively. This study provides conclusive insights for the accurate and effective detection and characterization of enhancers and their strengths.
引用
收藏
页数:16
相关论文
共 46 条
  • [1] SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
    Ahmad, Saeed
    Charoenkwan, Phasit
    Quinn, Julian M. W.
    Moni, Mohammad Ali
    Hasan, Md Mehedi
    Lio, Pietro
    Shoombuatong, Watshara
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01):
  • [2] System concentration shift as a regulator of transcription-translation system within liposomes
    Akui, Toshiki
    Fujiwara, Kei
    Sato, Gaku
    Takinoue, Masahiro
    Nomura, Shin-ichiro M.
    Doi, Nobuhide
    [J]. ISCIENCE, 2021, 24 (08)
  • [3] DeepCPPred: A Deep Learning Framework for the Discrimination of Cell-Penetrating Peptides and Their Uptake Efficiencies
    Arif, Muhammad
    Kabir, Muhammad
    Ahmed, Saeed
    Khan, Abid
    Ge, Fang
    Khelifi, Adel
    Yu, Dong-Jun
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (05) : 2749 - 2759
  • [4] Asim Muhammad Nabeel, 2020, Neural Information Processing. 27th International Conference, ICONIP 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12534), P38, DOI 10.1007/978-3-030-63836-8_4
  • [5] An Interpretable Prediction Model for Identifying N7-Methylguanosine Sites Based on XGBoost and SHAP
    Bi, Yue
    Xiang, Dongxu
    Ge, Zongyuan
    Li, Fuyi
    Jia, Cangzhi
    Song, Jiangning
    [J]. MOLECULAR THERAPY-NUCLEIC ACIDS, 2020, 22 : 362 - 372
  • [6] iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor
    Cai, Lijun
    Ren, Xuanbai
    Fu, Xiangzheng
    Peng, Li
    Gao, Mingyu
    Zeng, Xiangxiang
    [J]. BIOINFORMATICS, 2021, 37 (08) : 1060 - 1067
  • [7] AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning
    Charoenkwan, Phasit
    Ahmed, Saeed
    Nantasenamat, Chanin
    Quinn, Julian M. W.
    Moni, Mohammad Ali
    Lio, Pietro
    Shoombuatong, Watshara
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [8] iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties
    Chen, Wei
    Yang, Hui
    Feng, Pengmian
    Ding, Hui
    Lin, Hao
    [J]. BIOINFORMATICS, 2017, 33 (22) : 3518 - 3523
  • [9] iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
    Chen, Zhen
    Zhao, Pei
    Li, Fuyi
    Marquez-Lago, Tatiana T.
    Leier, Andre
    Revote, Jerico
    Zhu, Yan
    Powell, David R.
    Akutsu, Tatsuya
    Webb, Geoffrey, I
    Chou, Kuo-Chen
    Smith, A. Ian
    Daly, Roger J.
    Li, Jian
    Song, Jiangning
    [J]. BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) : 1047 - 1057
  • [10] iFeature: a Python']Python package and web server for features extraction and selection from protein and peptide sequences
    Chen, Zhen
    Zhao, Pei
    Li, Fuyi
    Leier, Andre
    Marquez-Lago, Tatiana T.
    Wang, Yanan
    Webb, Geoffrey I.
    Smith, A. Ian
    Daly, Roger J.
    Chou, Kuo-Chen
    Song, Jiangning
    [J]. BIOINFORMATICS, 2018, 34 (14) : 2499 - 2502