Comparative study of classification algorithms for immunosignaturing data

被引:27
作者
Kukreja, Muskan [1 ]
Johnston, Stephen Albert [1 ]
Stafford, Phillip [1 ]
机构
[1] Arizona State Univ, Ctr Innovat Med, Biodesign Inst, Tempe, AZ 85281 USA
关键词
Immunosignature; Random peptide microarray; Data mining; Classification algorithms; Naive Bayes; MULTILAYER PERCEPTRON; VALIDATION; PEPTIDES;
D O I
10.1186/1471-2105-13-139
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of features. As new microarrays are invented, classification systems that worked well for other array types may not be ideal. Expression microarrays, arguably one of the most prevalent array types, have been used for years to help develop classification algorithms. Many biological assumptions are built into classifiers that were designed for these types of data. One of the more problematic is the assumption of independence, both at the probe level and again at the biological level. Probes for RNA transcripts are designed to bind single transcripts. At the biological level, many genes have dependencies across transcriptional pathways where co-regulation of transcriptional units may make many genes appear as being completely dependent. Thus, algorithms that perform well for gene expression data may not be suitable when other technologies with different binding characteristics exist. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides. It relies on many-to-many binding of antibodies to the random sequence peptides. Each peptide can bind multiple antibodies and each antibody can bind multiple peptides. This technology has been shown to be highly reproducible and appears promising for diagnosing a variety of disease states. However, it is not clear what is the optimal classification algorithm for analyzing this new type of data. Results: We characterized several classification algorithms to analyze immunosignaturing data. We selected several datasets that range from easy to difficult to classify, from simple monoclonal binding to complex binding patterns in asthma patients. We then classified the biological samples using 17 different classification algorithms. Using a wide variety of assessment criteria, we found 'Naive Bayes' far more useful than other widely used methods due to its simplicity, robustness, speed and accuracy. Conclusions: 'Naive Bayes' algorithm appears to accommodate the complex patterns hidden within multilayered immunosignaturing microarray data due to its fundamental mathematical properties.
引用
收藏
页数:14
相关论文
共 42 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
[Anonymous], 1998, Ph.D. Thesis
[3]   Peptide microarrays for carbohydrate recognition [J].
Boltz, Kathryn W. ;
Gonzalez-Moa, Maria J. ;
Stafford, Phillip ;
Johnston, Stephen Albert ;
Svarovsky, Sergei A. .
ANALYST, 2009, 134 (04) :650-652
[4]   Bolstered error estimation [J].
Braga-Neto, U ;
Dougherty, E .
PATTERN RECOGNITION, 2004, 37 (06) :1267-1281
[5]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Statistical methods for analyzing immunosignatures [J].
Brown, Justin R. ;
Stafford, Phillip ;
Johnston, Stephen A. ;
Dinu, Valentin .
BMC BIOINFORMATICS, 2011, 12
[8]   Efficient training and improved performance of multilayer perceptron in pattern classification [J].
Chaudhuri, BB ;
Bhattacharya, U .
NEUROCOMPUTING, 2000, 34 :11-27
[9]  
Cleary J.G., 1995, PROC 12 INT C MACHIN, P108
[10]   PEPTIDES ON PHAGE - A VAST LIBRARY OF PEPTIDES FOR IDENTIFYING LIGANDS [J].
CWIRLA, SE ;
PETERS, EA ;
BARRETT, RW ;
DOWER, WJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1990, 87 (16) :6378-6382