Prediction of Super-enhancers Based on Mean-shift Undersampling

被引:0
作者
Cheng, Han [1 ]
Ding, Shumei [1 ]
Jia, Cangzhi [1 ]
机构
[1] Dalian Maritime Univ, Sch Sci, Dalian 116026, Peoples R China
基金
中国国家自然科学基金;
关键词
Super-enhancers; sequence information; XGBoost; mean-shift; clustering; under-sampling;
D O I
10.2174/0115748936268302231110111456
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Super-enhancers are clusters of enhancers defined based on the binding occupancy of master transcription factors, chromatin regulators, or chromatin marks. It has been reported that super-enhancers are transcriptionally more active and cell-type-specific than regular enhancers. Therefore, it is necessary to identify super-enhancers from regular enhancers. A variety of computational methods have been proposed to identify super-enhancers as auxiliary tools. However, most methods use ChIP-seq data, and the lack of this part of the data will make the predictor unable to execute or fail to achieve satisfactory performance.Objective The aim of this study is to propose a stacking computational model based on the fusion of multiple features to identify super-enhancers in both human and mouse species.Methods This work adopted mean-shift to cluster majority class samples and selected four sets of balanced datasets for mouse and three sets of balanced datasets for human to train the stacking model. Five types of sequence information are used as input to the XGBoost classifier, and the average value of the probability outputs from each classifier is designed as the final classification result.Results The results of 10-fold cross-validation and cross-cell-line validation prove that our method has superior performance compared to other existing methods. The source code and datasets are available at https://github.com/Cheng-Han-max/SE_voting.Conclusion The analysis of feature importance indicates that Mismatch accounts for the highest proportion among the top 20 important features.
引用
收藏
页码:651 / 662
页数:12
相关论文
共 27 条
  • [1] DEEPSEN: a convolutional neural network based method for super-enhancer prediction
    Bu, Hongda
    Hao, Jiaqi
    Gan, Yanglan
    Zhou, Shuigeng
    Guan, Jihong
    [J]. BMC BIOINFORMATICS, 2019, 20 (Suppl 15)
  • [2] iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor
    Cai, Lijun
    Ren, Xuanbai
    Fu, Xiangzheng
    Peng, Li
    Gao, Mingyu
    Zeng, Xiangxiang
    [J]. BIOINFORMATICS, 2021, 37 (08) : 1060 - 1067
  • [3] Carreira-Perpinan M. A., 2006, IEEE Conference on Computer Vision and Pattern Recognition, P1160, DOI DOI 10.1109/CVPR.2006.44
  • [4] Gaussian mean-shift is an EM algorithm
    Carreira-Perpinan, Miguel A.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (05) : 767 - 776
  • [5] Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs
    Chen, Ke
    Kurgan, Lukasz A.
    Ruan, Jishou
    [J]. BMC STRUCTURAL BIOLOGY, 2007, 7
  • [6] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [7] PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition
    Chen, Wei
    Lei, Tian-Yu
    Jin, Dian-Chuan
    Lin, Hao
    Chou, Kuo-Chen
    [J]. ANALYTICAL BIOCHEMISTRY, 2014, 456 : 53 - 60
  • [8] iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization
    Chen, Zhen
    Zhao, Pei
    Li, Chen
    Li, Fuyi
    Xiang, Dongxu
    Chen, Yong-Zi
    Akutsu, Tatsuya
    Daly, Roger J.
    Webb, Geoffrey, I
    Zhao, Quanzhi
    Kurgan, Lukasz
    Song, Jiangning
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (10)
  • [9] Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs
    Chen, Zhen
    Chen, Yong-Zi
    Wang, Xiao-Feng
    Wang, Chuan
    Yan, Ren-Xiang
    Zhang, Ziding
    [J]. PLOS ONE, 2011, 6 (07):
  • [10] MEAN SHIFT, MODE SEEKING, AND CLUSTERING
    CHENG, YZ
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (08) : 790 - 799