A Novel Predictor for the Analysis and Prediction of Enhancers and Their Strength via Multi-View Features and Deep Forest

被引：5

作者：

Gill, Mehwish ^{[1
]}

Ahmed, Saeed ^{[1
]}

Kabir, Muhammad ^{[1
,2
]}

Hayat, Maqsood ^{[3
]}

机构：

[1] Univ Management & Technol, Sch Syst & Technol, Lahore 54770, Pakistan

[2] Lund Univ, Biomed Ctr, B11, S-22184 Lund, Sweden

[3] Abdul Wali Khan Univ, Dept Comp Sci, Mardan 23200, Pakistan

来源：

INFORMATION | 2023年 / 14卷 / 12期

关键词：

enhancers; deep learning; deep forest; bioinformatics; feature representation; learning algorithms; classification; sequence-based models; DNA N-4-METHYLCYTOSINE SITES; SEQUENCE-BASED PREDICTOR; NEURAL-NETWORK; ACCURATE PREDICTION; BINDING-SITES; IDENTIFICATION; SELECTION; PROTEINS; MODEL;

D O I：

10.3390/info14120636

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Enhancers are short DNA segments (50-1500 bp) that effectively activate gene transcription when transcription factors (TFs) are present. There is a correlation between the genetic differences in enhancers and numerous human disorders including cancer and inflammatory bowel disease. In computational biology, the accurate categorization of enhancers can yield important information for drug discovery and development. High-throughput experimental approaches are thought to be vital tools for researching enhancers' key characteristics; however, because these techniques require a lot of labor and time, it might be difficult for researchers to forecast enhancers and their powers. Therefore, computational techniques are considered an alternate strategy for handling this issue. Based on the types of algorithms that have been used to construct predictors, the current methodologies can be divided into three primary categories: ensemble-based methods, deep learning-based approaches, and traditional ML-based techniques. In this study, we developed a novel two-layer deep forest-based predictor for accurate enhancer and strength prediction, namely, NEPERS. Enhancers and non-enhancers are divided at the first level by NEPERS, whereas strong and weak enhancers are divided at the second level. To evaluate the effectiveness of feature fusion, block-wise deep forest and other algorithms were combined with multi-view features such as PSTNPss, PSTNPdss, CKSNAP, and NCP via 10-fold cross-validation and independent testing. Our proposed technique performs better than competing models across all parameters, with an ACC of 0.876, Sen of 0.864, Spe of 0.888, MCC of 0.753, and AUC of 0.940 for layer 1 and an ACC of 0.959, Sen of 0.960, Spe of 0.958, MCC of 0.918, and AUC of 0.990 for layer 2, respectively, for the benchmark dataset. Similarly, for the independent test, the ACC, Sen, Spe, MCC, and AUC were 0.863, 0.865, 0.860, 0.725, and 0.948 for layer 1 and 0.890, 0.940, 0.840, 0.784, and 0.951 for layer 2, respectively. This study provides conclusive insights for the accurate and effective detection and characterization of enhancers and their strengths.

引用

页数：16

共 46 条

[1] SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
Ahmad, Saeed
Charoenkwan, Phasit
Quinn, Julian M. W.
Moni, Mohammad Ali
Hasan, Md Mehedi
Lio, Pietro
Shoombuatong, Watshara
[J]. SCIENTIFIC REPORTS, 2022, 12 (01):
[2] System concentration shift as a regulator of transcription-translation system within liposomes
Akui, Toshiki
Fujiwara, Kei
Sato, Gaku
Takinoue, Masahiro
Nomura, Shin-ichiro M.
Doi, Nobuhide
[J]. ISCIENCE, 2021, 24 (08)
[3] DeepCPPred: A Deep Learning Framework for the Discrimination of Cell-Penetrating Peptides and Their Uptake Efficiencies
Arif, Muhammad
Kabir, Muhammad
Ahmed, Saeed
Khan, Abid
Ge, Fang
Khelifi, Adel
Yu, Dong-Jun
[J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (05) : 2749 - 2759
[4] Asim Muhammad Nabeel, 2020, Neural Information Processing. 27th International Conference, ICONIP 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12534), P38, DOI 10.1007/978-3-030-63836-8_4
[5] An Interpretable Prediction Model for Identifying N7-Methylguanosine Sites Based on XGBoost and SHAP
Bi, Yue
Xiang, Dongxu
Ge, Zongyuan
Li, Fuyi
Jia, Cangzhi
Song, Jiangning
[J]. MOLECULAR THERAPY-NUCLEIC ACIDS, 2020, 22 : 362 - 372
[6] iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor
Cai, Lijun
Ren, Xuanbai
Fu, Xiangzheng
Peng, Li
Gao, Mingyu
Zeng, Xiangxiang
[J]. BIOINFORMATICS, 2021, 37 (08) : 1060 - 1067
[7] AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning
Charoenkwan, Phasit
Ahmed, Saeed
Nantasenamat, Chanin
Quinn, Julian M. W.
Moni, Mohammad Ali
Lio, Pietro
Shoombuatong, Watshara
[J]. SCIENTIFIC REPORTS, 2022, 12 (01)
[8] iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties
Chen, Wei
Yang, Hui
Feng, Pengmian
Ding, Hui
Lin, Hao
[J]. BIOINFORMATICS, 2017, 33 (22) : 3518 - 3523
[9] iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
Chen, Zhen
Zhao, Pei
Li, Fuyi
Marquez-Lago, Tatiana T.
Leier, Andre
Revote, Jerico
Zhu, Yan
Powell, David R.
Akutsu, Tatsuya
Webb, Geoffrey, I
Chou, Kuo-Chen
Smith, A. Ian
Daly, Roger J.
Li, Jian
Song, Jiangning
[J]. BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) : 1047 - 1057
[10] iFeature: a Python']Python package and web server for features extraction and selection from protein and peptide sequences
Chen, Zhen
Zhao, Pei
Li, Fuyi
Leier, Andre
Marquez-Lago, Tatiana T.
Wang, Yanan
Webb, Geoffrey I.
Smith, A. Ian
Daly, Roger J.
Chou, Kuo-Chen
Song, Jiangning
[J]. BIOINFORMATICS, 2018, 34 (14) : 2499 - 2502

← 1 2 3 4 5 →