A New Sequential Forward Feature Selection (SFFS) Algorithm for Mining Best Topological and Biological Features to Predict Protein Complexes from Protein-Protein Interaction Networks (PPINs)

被引：12

作者：

Younis, Haseeb ^{[1
,2
]}

Anwar, Muhammad Waqas ^{[2
]}

Khan, Muhammad Usman Ghani ^{[3
]}

Sikandar, Aisha ^{[4
]}

Bajwa, Usama Ijaz ^{[2
]}

机构：

[1] Univ Management & Technol, Sch Profess Advancement, Lahore, Pakistan

[2] COMSATS Univ Islamabad, Dept Comp Sci, Lahore, Pakistan

[3] Univ Engn & Technol, Dept Comp Sci & Engn, Lahore, Pakistan

[4] Govt Girls Post Grad Coll 1 Abbottabad, Abbottabad, Pakistan

来源：

INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES | 2021年 / 13卷 / 03期

关键词：

Protein complex detection; Protein– protein interaction network; Machine learning; Complex topology; RECOGNITION; PATHWAYS; DATABASE; AAINDEX; TOOL;

D O I：

10.1007/s12539-021-00433-8

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Protein-protein interaction plays an important role in the understanding of biological processes in the body. A network of dynamic protein complexes within a cell that regulates most biological processes is known as a protein-protein interaction network (PPIN). Complex prediction from PPINs is a challenging task. Most of the previous computation approaches mine cliques, stars, linear and hybrid structures as complexes from PPINs by considering topological features and fewer of them focus on important biological information contained within protein amino acid sequence. In this study, we have computed a wide variety of topological features and integrate them with biological features computed from protein amino acid sequence such as bag of words, physicochemical and spectral domain features. We propose a new Sequential Forward Feature Selection (SFFS) algorithm, i.e., random forest-based Boruta feature selection for selecting the best features from computed large feature set. Decision tree, linear discriminant analysis and gradient boosting classifiers are used as learners. We have conducted experiments by considering two reference protein complex datasets of yeast, i.e., CYC2008 and MIPS. Human and mouse complex information is taken from CORUM 3.0 dataset. Protein interaction information is extracted from the database of interacting proteins (DIP). Our proposed SFFS, i.e., random forest-based Brouta feature selection in combination with decision trees, linear discriminant analysis and Gradient Boosting Classifiers outperforms other state of art algorithms by achieving precision, recall and F-measure rates, i.e. 94.58%, 94.92% and 94.45% for MIPS, 96.31%, 93.55% and 96.02% for CYC2008, 98.84%, 98.00%, 98.87 % for CORUM humans and 96.60%, 96.70%, 96.32% for CORUM mouse dataset complexes, respectively.

引用

页码：371 / 388

页数：18

共 23 条

[21] Identifying protein complexes from protein-protein interaction networks based on the gene expression profile and core-attachment approach
Noori, Soheir
Al-A'Araji, Nabeel
Al-Shamery, Eman
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2021, 19 (03)
[22] MOEPGA: A novel method to detect protein complexes in yeast protein-protein interaction networks based on Multi Objective Evolutionary Programming Genetic Algorithm
Cao, Buwen
Luo, Jiawei
Liang, Cheng
Wang, Shulin
Song, Dan
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2015, 58 : 173 - 181
[23] Decision tree classifier based on topological characteristics of subgraph for the mining of protein complexes from large scale PPI networks
Sahoo, Tushar Ranjan
Patra, Sabyasachi
Vipsita, Swati
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2023, 106

← 1 2 3 →