StackEPI: identification of cell line-specific enhancer-promoter interactions based on stacking ensemble learning

被引:4
作者
Fan, Yongxian [1 ]
Peng, Binchao [1 ]
机构
[1] Guilin Univ Elect Technol, Sch Comp Sci & Informat Secur, Guilin 541004, Peoples R China
基金
中国国家自然科学基金;
关键词
Enhancer-promoter interaction; Bioinformatics; Machine learning; Stacking strategy; Feature extraction; 3D GENOME; PRINCIPLES; PSEKNC;
D O I
10.1186/s12859-022-04821-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Understanding the regulatory role of enhancer-promoter interactions (EPIs) on specific gene expression in cells contributes to the understanding of gene regulation, cell differentiation, etc., and its identification has been a challenging task. On the one hand, using traditional wet experimental methods to identify EPIs often means a lot of human labor and time costs. On the other hand, although the currently proposed computational methods have good recognition effects, they generally require a long training time. Results: In this study, we studied the EPIs of six human cell lines and designed a cell line-specific EPIs prediction method based on a stacking ensemble learning strategy, which has better prediction performance and faster training speed, called StackEPI. Specifically, by combining different encoding schemes and machine learning methods, our prediction method can extract the cell line-specific effective information of enhancer and promoter gene sequences comprehensively and in many directions, and make accurate recognition of cell line-specific EPIs. Ultimately, the source code to implement StackEPI and experimental data involved in the experiment are available at https://github.com/20032303092/StackEPI.git. Conclusions: The comparison results show that our model can deliver better performance on the problem of identifying cell line-specific EPIs and outperform other state-of-the-art models. In addition, our model also has a more efficient computation speed.
引用
收藏
页数:18
相关论文
共 50 条
[1]  
[Anonymous], 2021, IEEE Trans. Broadcast.
[2]  
[Anonymous], 2011, Proceedings of the 20th International Conference on World Wide Web, WWW '11, DOI DOI 10.1145/1963405.1963461
[3]   Integrative machine learning framework for the identification of cell-specific enhancers from the human genome [J].
Basith, Shaherin ;
Hasan, Md Mehedi ;
Lee, Gwang ;
Wei, Leyi ;
Manavalan, Balachandran .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
[4]   An Interpretable Prediction Model for Identifying N7-Methylguanosine Sites Based on XGBoost and SHAP [J].
Bi, Yue ;
Xiang, Dongxu ;
Ge, Zongyuan ;
Li, Fuyi ;
Jia, Cangzhi ;
Song, Jiangning .
MOLECULAR THERAPY-NUCLEIC ACIDS, 2020, 22 :362-372
[5]   Genome Architecture: Domain Organization of Interphase Chromosomes [J].
Bickmore, Wendy A. ;
van Steensel, Bas .
CELL, 2013, 152 (06) :1270-1284
[6]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[7]   StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides [J].
Charoenkwan, Phasit ;
Chiangjong, Wararat ;
Nantasenamat, Chanin ;
Hasan, Md Mehedi ;
Manavalan, Balachandran ;
Shoombuatong, Watshara .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
[8]   BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides [J].
Charoenkwan, Phasit ;
Nantasenamat, Chanin ;
Hasan, Md Mehedi ;
Manavalan, Balachandran ;
Shoombuatong, Watshara .
BIOINFORMATICS, 2021, 37 (17) :2556-2562
[9]  
Chen T., 2016, XGBoost: A Scalable Tree Boosting System|Semantic ScholarEB/OL, V13, P785
[10]   Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences [J].
Chen, Wei ;
Lin, Hao ;
Chou, Kuo-Chen .
MOLECULAR BIOSYSTEMS, 2015, 11 (10) :2620-2634