Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm

被引:75
作者
Raymer, ML [1 ]
Doom, TE
Kuhn, LA
Punch, WF
机构
[1] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA
[2] Michigan State Univ, Dept Biochem, E Lansing, MI 48824 USA
[3] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2003年 / 33卷 / 05期
基金
美国国家科学基金会;
关键词
bioinformatics; evolutionary computing; genetic algorithms; pattern recognition;
D O I
10.1109/TSMCB.2003.816922
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A key element of bioinformatics research is the extraction of meaningful information from large experimental data-sets. Various approaches, including statistical and graph theoretical methods, data mining, and computational pattern recognition, have been applied to this task with varying degrees of success. Using a novel classifier based on the Bayes discriminant function, we present a hybrid algorithm that employs feature selection and extraction to isolate salient features from large medical and other biological data sets. We have previously shown that a genetic algorithm coupled with a k-nearest-neighbors classifier performs well in extracting information about protein-water binding from X-ray crystallographic protein structure data. The effectiveness of the hybrid EC-Bayes classifier is demonstrated to distinguish the features of this data set that are the most statistically relevant and to weight these features appropriately to aid in the prediction of solvation sites.
引用
收藏
页码:802 / 813
页数:12
相关论文
共 52 条
[1]  
ABOLA EE, 1987, PROTEIN DATA BANK CR, P107
[2]  
Aeberhard S, 1992, TECH REP, V92, DOI [10.1016/0031-3203(94)90145-7, DOI 10.1016/0031-3203(94)90145-7]
[3]  
[Anonymous], 1998, FEATURE EXTRACTION C
[4]  
[Anonymous], 9201 J COOK U N QUEE
[5]  
[Anonymous], P 11 INT JOINT C ART
[6]   HYDROGEN-BONDING IN GLOBULAR-PROTEINS [J].
BAKER, EN ;
HUBBARD, RE .
PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY, 1984, 44 (02) :97-179
[7]  
BAYES T, 1763, PHIL T ROY SOC, V53
[8]   PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES [J].
BERNSTEIN, FC ;
KOETZLE, TF ;
WILLIAMS, GJB ;
MEYER, EF ;
BRICE, MD ;
RODGERS, JR ;
KENNARD, O ;
SHIMANOUCHI, T ;
TASUMI, M .
JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) :535-542
[9]  
Blake C.L., 1998, UCI repository of machine learning databases
[10]  
CESTNIK G, 1987, PROGR MACHINE LEARNI, P31