An Empirical Study of Features Fusion Techniques for Protein-Protein Interaction Prediction

被引：52

作者：

Zeng, Jiancang ^{[1
]}

Li, Dapeng ^{[2
]}

Wu, Yunfeng ^{[1
]}

Zou, Quan ^{[3
]}

Liu, Xiangrong ^{[1
]}

机构：

[1] Xiamen Univ, Sch Informat Sci & Engn, Xiamen, Peoples R China

[2] Fourth Hosp Qinhuangdao, Dept Internal Med Oncol, Qinhuangdao, Peoples R China

[3] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300354, Peoples R China

来源：

CURRENT BIOINFORMATICS | 2016年 / 11卷 / 01期

关键词：

Features fusion; features selection; Random Forests; protein-protein interaction; INTEGRATED RESOURCE; SIGNALING NETWORKS; FEATURE-SELECTION; IDENTIFICATION; INFORMATION; DATABASE; GRAM; RNA;

D O I：

10.2174/1574893611666151119221435

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

With recent development of bioinformatics, the importance of understanding protein function has been widely acknowledged. Most proteins perform their functions by interacting with other proteins. Hence, it is urgent to explore the protein-protein interaction (PPI). At present, the prediction of PPIs is still a tough problem. Despite the fact that a variety of computational methods have been proposed to identify PPIs; unfortunately, most of them are complex and with low accuracy. Traditional methods extract features following two steps: firstly, they extract features from two proteins of a PPI; secondly, they regard two features as strings, and do concatenation operator. Concatenation is an outcome of an addition operation on strings. The concatenation operator increases redundancy features with the result being associated with the order of concatenation. Based on this, in this paper, we study the features fusion and features selection. The presented framework consists of three stages: in the first stage, we get the negative data set from off-the-shelf database. The reliability of negative data set of previous studies has not been of concern to us. While in the second stage, the n-gram frequency method was used to preprocess the PPIs sequences. The third one was applied to splice the final feature, and then the features were selected to find the optimal feature. Finally, an effective parameter for the Random Forest Classifier was selected. Experiments carried out on real data set showed that our features fusion method outperformed traditional methods in terms of protein-protein interaction prediction. The encouraging results can be helpful for future research in protein function.

引用

页码：4 / 12

页数：9

共 39 条

[1] Blohm P, 2013, NUCLEIC ACIDS RES, V2013
[2] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[3] Prediction of lysine ubiquitination with mRMR feature selection and analysis
Cai, Yudong
Huang, Tao
Hu, Lele
Shi, Xiaohe
Xie, Lu
Li, Yixue
[J]. AMINO ACIDS, 2012, 42 (04) : 1387 - 1395
[4] YPD™, PombePD™ and WormPD™:: model organism volumes of the BioKnowledge™ Library, an integrated resource for protein information
Costanzo, MC
Crawford, ME
Hirschman, JE
Kranz, JE
Olsen, P
Robertson, LS
Skrzypek, MS
Braun, BR
Hopkins, KL
Kondu, P
Lengieza, C
Lew-Smith, JE
Tillberg, M
Garrels, JI
[J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 75 - 79
[5] Minimum redundancy feature selection from microarray gene expression data
Ding, C
Peng, HC
[J]. PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 523 - 528
[6] Prediction of protein-protein interactions from primary sequences
Dong, Qiwen
Zhou, Shuigeng
Liu, Xuan
[J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2010, 4 (02) : 211 - 227
[7] Identification of function-associated loop motifs and application to protein function prediction
Espadaler, Jordi
Querol, Enrique
Aviles, Francesc X.
Oliva, Baldo
[J]. BIOINFORMATICS, 2006, 22 (18) : 2237 - 2243
[8] Using support vector machine combined with auto covariance to predict proteinprotein interactions from protein sequences
Guo, Yanzhi
Yu, Lezheng
Wen, Zhining
Li, Menglong
[J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (09) : 3025 - 3030
[9] Hall M., 2009, SIGKDD EXPLORATIONS, V11, P10, DOI [DOI 10.1145/1656274.1656278, 10.1145/1656274.1656278]
[10] Enhanced automated function prediction using distantly related sequences and contextual association by PFP
Hawkins, Troy
Luban, Stanislav
Kihara, Daisuke
[J]. PROTEIN SCIENCE, 2006, 15 (06) : 1550 - 1556

← 1 2 3 4 →