A New Sequential Forward Feature Selection (SFFS) Algorithm for Mining Best Topological and Biological Features to Predict Protein Complexes from Protein-Protein Interaction Networks (PPINs)

被引:12
|
作者
Younis, Haseeb [1 ,2 ]
Anwar, Muhammad Waqas [2 ]
Khan, Muhammad Usman Ghani [3 ]
Sikandar, Aisha [4 ]
Bajwa, Usama Ijaz [2 ]
机构
[1] Univ Management & Technol, Sch Profess Advancement, Lahore, Pakistan
[2] COMSATS Univ Islamabad, Dept Comp Sci, Lahore, Pakistan
[3] Univ Engn & Technol, Dept Comp Sci & Engn, Lahore, Pakistan
[4] Govt Girls Post Grad Coll 1 Abbottabad, Abbottabad, Pakistan
关键词
Protein complex detection; Protein– protein interaction network; Machine learning; Complex topology; RECOGNITION; PATHWAYS; DATABASE; AAINDEX; TOOL;
D O I
10.1007/s12539-021-00433-8
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein-protein interaction plays an important role in the understanding of biological processes in the body. A network of dynamic protein complexes within a cell that regulates most biological processes is known as a protein-protein interaction network (PPIN). Complex prediction from PPINs is a challenging task. Most of the previous computation approaches mine cliques, stars, linear and hybrid structures as complexes from PPINs by considering topological features and fewer of them focus on important biological information contained within protein amino acid sequence. In this study, we have computed a wide variety of topological features and integrate them with biological features computed from protein amino acid sequence such as bag of words, physicochemical and spectral domain features. We propose a new Sequential Forward Feature Selection (SFFS) algorithm, i.e., random forest-based Boruta feature selection for selecting the best features from computed large feature set. Decision tree, linear discriminant analysis and gradient boosting classifiers are used as learners. We have conducted experiments by considering two reference protein complex datasets of yeast, i.e., CYC2008 and MIPS. Human and mouse complex information is taken from CORUM 3.0 dataset. Protein interaction information is extracted from the database of interacting proteins (DIP). Our proposed SFFS, i.e., random forest-based Brouta feature selection in combination with decision trees, linear discriminant analysis and Gradient Boosting Classifiers outperforms other state of art algorithms by achieving precision, recall and F-measure rates, i.e. 94.58%, 94.92% and 94.45% for MIPS, 96.31%, 93.55% and 96.02% for CYC2008, 98.84%, 98.00%, 98.87 % for CORUM humans and 96.60%, 96.70%, 96.32% for CORUM mouse dataset complexes, respectively.
引用
收藏
页码:371 / 388
页数:18
相关论文
共 23 条
  • [1] A New Sequential Forward Feature Selection (SFFS) Algorithm for Mining Best Topological and Biological Features to Predict Protein Complexes from Protein–Protein Interaction Networks (PPINs)
    Haseeb Younis
    Muhammad Waqas Anwar
    Muhammad Usman Ghani Khan
    Aisha Sikandar
    Usama Ijaz Bajwa
    Interdisciplinary Sciences: Computational Life Sciences, 2021, 13 : 371 - 388
  • [2] Identification of essential proteins based on a new combination of topological and biological features in weighted protein-protein interaction networks
    Elahi, Abdolkarim
    Babamir, Seyed Morteza
    IET SYSTEMS BIOLOGY, 2018, 12 (06) : 247 - 257
  • [3] A New Structure Feature Introduced to Predict Protein-Protein Interaction Sites
    Lai, Lingwei
    Geng, Jing
    Duan, Haochen
    Chen, Siyuan
    Huang, Lvwen
    Yu, Jiantao
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2025,
  • [4] Detecting Protein Complexes from Signed Protein-Protein Interaction Networks
    Le Ou-Yang
    Dai, Dao-Qing
    Zhang, Xiao-Fei
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (06) : 1333 - 1344
  • [5] Detecting temporal protein complexes from dynamic protein-protein interaction networks
    Ou-Yang, Le
    Dai, Dao-Qing
    Li, Xiao-Li
    Wu, Min
    Zhang, Xiao-Fei
    Yang, Peng
    BMC BIOINFORMATICS, 2014, 15
  • [6] Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks
    Liu, Xiaoxia
    Yang, Zhihao
    Sang, Shengtian
    Zhou, Ziwei
    Wang, Lei
    Zhang, Yin
    Lin, Hongfei
    Wang, Jian
    Xu, Bo
    BMC BIOINFORMATICS, 2018, 19
  • [7] Resilience of protein-protein interaction networks as determined by their large-scale topological features
    Rodrigues, Francisco A.
    Costa, Luciano da Fontoura
    Barbieri, Andre Luiz
    MOLECULAR BIOSYSTEMS, 2011, 7 (04) : 1263 - 1269
  • [8] Mining functional subgraphs from cancer protein-protein interaction networks
    Shen, Ru
    Goonesekere, Nalin C. W.
    Guda, Chittibabu
    BMC SYSTEMS BIOLOGY, 2012, 6
  • [9] An Effective Link-Based Clustering Algorithm for Detecting Overlapping Protein Complexes in Protein-Protein Interaction Networks
    Hu, Lun
    Zhang, Jun
    Pan, Xiangyu
    Luo, Xin
    Yuan, Huaqiang
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2021, 8 (04): : 3275 - 3289
  • [10] A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity
    Lei, Chengwei
    Ruan, Jianhua
    BIOINFORMATICS, 2013, 29 (03) : 355 - 364