Decision Tree Based Approaches for Detecting Protein Complex in Protein Protein Interaction Network (PPI) via Link and Sequence Analysis

被引：29

作者：

Sikandar, Aisha ^{[1
]}

Anwar, Waqas ^{[2
]}

Bajwa, Usama Ijaz ^{[2
]}

Wang, Xuan ^{[3
]}

Sikandar, Misba ^{[1
]}

Yao, Lin ^{[3
]}

Jiang, Zoe L. ^{[3
]}

Zhang Chunkai ^{[3
]}

机构：

[1] COMSATS Inst Informat Technol, Dept Comp Sci, Abbottabad 22060, Pakistan

[2] COMSATS Inst Informat Technol, Dept Comp Sci, Lahore 54000, Pakistan

[3] Harbin Inst Technol, Shenzhen Grad Sch, Shenzhen 518055, Peoples R China

来源：

IEEE ACCESS | 2018年 / 6卷

关键词：

Protein-protein interaction (PPI); decision tree classifier; graph topological patterns; biological features; ALGORITHM;

D O I：

10.1109/ACCESS.2018.2807811

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A network of modular protein complexes inside a cell coordinates many biological processes and is known as protein-protein interaction (PPI) network. APPI network can be modeled as a graph, in which edges represent interactions among proteins, and sub graphs represent protein complexes. Previous methods for protein complex mining from PPI network mainly focused on few topological features like density and degree statistics based on the assumption that proteins inside a complex are highly interactive with each other and thus form dense subgraphs. While this assumption is true for some complexes, it doesn't hold for many others. The important biological information within the protein amino acid sequences, which estimates the interacting property among two proteins for performing a specific biological function is not considered in most of the previous studies. There is a need for algorithms that consider both topological and biological features for correctly identifying protein complexes having varying topological structures and biological patterns inside a PPI network. In this paper, we present an algorithm for detecting protein complexes from interaction graphs. By using graph topological patterns and biological properties as features, we model each complex sub graph by decision tree learners. We use a training set of known complexes to construct decision trees in depth first and BEST FIRST manner using divide and conquer strategy. Splitting criterion, such as information and Gini gain are used in tree expansion process. Training set is divided into subsets and each subset is represented as a branch of tree. Pruning techniques are used to reduce the size of tree. We applied our method to protein interaction data in yeast on two benchmark data sets, i.e., MIPS and CYC2008. According to our results, decision trees achieved a considerable improvement over clique-based algorithms in terms of its ability to recover known complexes by using integrated biological and topological properties.

引用

页码：22108 / 22120

页数：13

共 25 条

[1] CFinder:: locating cliques and overlapping modules in biological networks [J].

Adamcsek, B ;

Palla, G ;

Farkas, IJ ;

Derényi, I ;

Vicsek, T .

BIOINFORMATICS, 2006, 22 (08) :1021-1023

[2] Development and implementation of an algorithm for detection of protein complexes in large interaction networks [J].

Altaf-Ul-Amin, Md ;

Shinbo, Yoko ;

Mihara, Kenji ;

Kurokawa, Ken ;

Kanaya, Shigehiko .

BMC BIOINFORMATICS, 2006, 7 (1)

[3]

[Anonymous], 2013, SACCHAROMYCES GENOME

[4] An automated method for finding molecular complexes in large protein interaction networks [J].

Bader, GD ;

Hogue, CW .

BMC BIOINFORMATICS, 2003, 4 (1)

[5] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].

Blewitt, Marnie E. ;

Gendrel, Anne-Valerie ;

Pang, Zhenyi ;

Sparrow, Duncan B. ;

Whitelaw, Nadia ;

Craig, Jeffrey M. ;

Apedaile, Anwyn ;

Hilton, Douglas J. ;

Dunwoodie, Sally L. ;

Brockdorff, Neil ;

Kay, Graham F. ;

Whitelaw, Emma .

NATURE GENETICS, 2008, 40 (05) :663-669

[6]

Haijian S., 2007, THESIS

[7] A comprehensive two-hybrid analysis to explore the yeast protein interactome [J].

Ito, T ;

Chiba, T ;

Ozawa, R ;

Yoshida, M ;

Hattori, M ;

Sakaki, Y .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (08) :4569-4574

[8]

Jolliffe IT., 1986, Principal Component Analysis for Special Types of Data, P115, DOI 10.1007/978-1-4757-1904-8_7

[9]

Jones K.S., 1981, INFORM RETRIEVAL EXP

[10] Protein complex prediction via cost-based clustering [J].

King, AD ;

Przulj, N ;

Jurisica, I .

BIOINFORMATICS, 2004, 20 (17) :3013-3020

← 1 2 3 →