Pathway-based microarray analysis for robust disease classification

被引:13
作者
Sootanan, Pitak [2 ]
Prom-on, Santitham [3 ]
Meechai, Asawin [4 ]
Chan, Jonathan H. [1 ]
机构
[1] King Mongkuts Univ Technol Thonburi, Sch Informat Technol, Bangkok, Thailand
[2] King Mongkuts Univ Technol Thonburi, Individual Based Program Bioinformat, Bangkok, Thailand
[3] King Mongkuts Univ Technol Thonburi, Dept Comp Engn, Bangkok, Thailand
[4] King Mongkuts Univ Technol Thonburi, Dept Chem Engn, Bangkok, Thailand
关键词
Microarray analysis; Disease classification; Pathway activity; Negatively correlated feature sets; Phenotype-correlated genes; Discriminative score; GENE-EXPRESSION; CANCER;
D O I
10.1007/s00521-011-0662-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The advent of high-throughput technology has made it possible to measure genome-wide expression profiles, thus providing a new basis for microarray-based diagnosis of disease states. Numerous methods have been proposed to identify biomarkers that can accurately discriminate between case and control classes. Many of the methods used only a subset of ranked genes in the pathway and may not be able to fully represent the classification boundaries for the two disease classes. The use of negatively correlated feature sets (NCFS) to obtain more relevant features in form of phenotype-correlated genes (PCOGs) and inferring pathway activities is proposed in this study. The two pathway activity inference schemes that use NCFS significantly improved the power of pathway markers to discriminate between two phenotypes classes in microarray expression datasets of breast cancer. In particular, the NCFS-i method provided better contrasting features for classification purposes. The improvement is consistent for all cases of pathways used, using both within- and across-dataset validations. The results show that the two proposed methods that use NCFS clearly outperformed other pathway-based classifiers in terms of both ROC area and discriminative score. That is, the identification of PCOGs within each pathway, especially NCFS-i method, helps to reduce noisy or variable measurements, leading to a high performance and more robust classifier. In summary, we have demonstrated that effective incorporation of pathway information into expression-based disease diagnosis and using NCFS can provide better discriminative and more robust models.
引用
收藏
页码:649 / 660
页数:12
相关论文
共 23 条
[1]  
[Anonymous], HDB BIOL STAT
[2]   Cancer - Gene expression in diagnosis [J].
Berns, A .
NATURE, 2000, 403 (6769) :491-492
[3]   Oncogenic pathway signatures in human cancers as a guide to targeted therapies [J].
Bild, AH ;
Yao, G ;
Chang, JT ;
Wang, QL ;
Potti, A ;
Chasse, D ;
Joshi, MB ;
Harpole, D ;
Lancaster, JM ;
Berchuck, A ;
Olson, JA ;
Marks, JR ;
Dressman, HK ;
West, M ;
Nevins, JR .
NATURE, 2006, 439 (7074) :353-357
[4]   Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting [J].
Dupuy, Alain ;
Simon, Richard M. .
JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2007, 99 (02) :147-157
[5]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210
[6]   Pathway-specific differences between tumor cell lines and normal and tumor tissue cells [J].
Ertel, Adam ;
Verghese, Arun ;
Byers, Stephen W. ;
Ochs, Michael ;
Tozeren, Aydin .
MOLECULAR CANCER, 2006, 5 (1)
[7]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[8]   Towards precise classification of cancers based on robust gene functional expression profiles [J].
Guo, Z ;
Zhang, TW ;
Li, X ;
Wang, Q ;
Xu, JZ ;
Yu, H ;
Zhu, J ;
Wang, HY ;
Wang, CG ;
Topol, EJ ;
Wang, Q ;
Rao, SQ .
BMC BIOINFORMATICS, 2005, 6 (1)
[9]  
Hall M., 2009, SIGKDD Explorations, V11, P10, DOI DOI 10.1145/1656274.1656278
[10]  
Helman P, 2004, J COMPUT BIOL, V11, P581, DOI 10.1089/1066527041887294