SFM: A novel sequence-based fusion method for disease genes identification and prioritization

被引：10

作者：

Yousef, Abdulaziz ^{[1
]}

Charkari, Nasrollah Moghadam ^{[1
]}

机构：

[1] Tarbiat Modares Univ, Fac Elect & Comp Engn, Tehran, Iran

来源：

JOURNAL OF THEORETICAL BIOLOGY | 2015年 / 383卷

关键词：

Classification; Disease gene; Protein; Physicochemical properties of amino acid; Fusion method; PROTEIN-PROTEIN INTERACTIONS; PREDICTION; FEATURES; AUTOCORRELATION; CLASSIFICATION; SIMILARITY; SURFACE;

D O I：

10.1016/j.jtbi.2015.07.010

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The identification of disease genes from human genome is of great importance to improve diagnosis and treatment of disease. Several machine learning methods have been introduced to identify disease genes. However, these methods mostly differ in the prior knowledge used to construct the feature vector for each instance (gene), the ways of selecting negative data (non-disease genes) where there is no investigational approach to find them and the classification methods used to make the final decision. In this work, a novel Sequence-based fusion method (SFM) is proposed to identify disease genes. In this regard, unlike existing methods, instead of using a noisy and incomplete prior-knowledge, the amino acid sequence of the proteins which is universal data has been carried out to present the genes (proteins) into four different feature vectors. To select more likely negative data from candidate genes, the intersection set of four negative sets which are generated using distance approach is considered. Then, Decision Tree (C4.5) has been applied as a fusion method to combine the results of four independent state-of the-art predictors based on support vector machine (SVM) algorithm, and to make the final decision. The experimental results of the proposed method have been evaluated by some standard measures. The results indicate the precision, recall and F-measure of 82.6%, 85.6% and 84, respectively. These results confirm the efficiency and validity of the proposed method. (C) 2015 Elsevier Ltd. All rights reserved.

引用

页码：12 / 19

页数：8

共 46 条

[1] Gene prioritization through genomic data fusion
Aerts, S
Lambrechts, D
Maity, S
Van Loo, P
Coessens, B
De Smet, F
Tranchevent, LC
De Moor, B
Marynen, P
Hassan, B
Carmeliet, P
Moreau, Y
[J]. NATURE BIOTECHNOLOGY, 2006, 24 (05) : 537 - 544
[2] Prediction of human disease genes by human-mouse conserved coexpression analysis
Ala, Ugo
Piro, Rosario Michael
Grassi, Elena
Damasco, Christian
Silengo, Lorenzo
Oti, Martin
Provero, Paolo
Di Cunto, Ferdinando
[J]. PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (03)
[3] [Anonymous], 2010, DATABASE
[4] [Anonymous], 1996, CS9603103 ARXIV
[5] Enzyme family classification by support vector machines
Cai, CZ
Han, LY
Ji, ZL
Chen, YZ
[J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (01) : 66 - 76
[6] SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence
Cai, CZ
Han, LY
Ji, ZL
Chen, X
Chen, YZ
[J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3692 - 3697
[7] THE STRUCTURAL DEPENDENCE OF AMINO-ACID HYDROPHOBICITY PARAMETERS
CHARTON, M
CHARTON, BI
[J]. JOURNAL OF THEORETICAL BIOLOGY, 1982, 99 (04) : 629 - 644
[8] PROTEINS - 1000 FAMILIES FOR THE MOLECULAR BIOLOGIST
CHOTHIA, C
[J]. NATURE, 1992, 357 (6379) : 543 - 544
[9] NATURE OF ACCESSIBLE AND BURIED SURFACES IN PROTEINS
CHOTHIA, C
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1976, 105 (01) : 1 - 14
[10] Prediction of protein subcellular locations by GO-FunD-PseAA predictor
Chou, KC
Cai, YD
[J]. BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2004, 320 (04) : 1236 - 1239

← 1 2 3 4 5 →