A data structure and function classification based method to evaluate clustering models for gene expression data

被引:0
作者
易东
杨梦苏
黄明辉
李辉智
王文昌
机构
[1] Department of Medical Statistics
[2] Third Military Medical University
[3] Chongqing
[4] China
[5] Applied Research Centre for Genomics Technology
[6] Department of Biology & Chemistry
[7] City University of Hong Kong
[8] Tat Chee Avenue
[9] Kowloon
[10] Hong Kong
[11] Department of Electronic Technology
[12] Southwest University of Politics and Law Science
[13] China
关键词
gene expression; evaluation of clustering; adjust-; FOM; entropy;
D O I
暂无
中图分类号
R311 [医用数学];
学科分类号
1001 ;
摘要
<正> Objective: To establish a systematic framework for selecting the best clustering algorithm and provide an evaluation method for clustering analyses of gene expression data. Methods: Based on data structure (internal information) and function classification (external information), the evaluation of gene expression data analyses were carried out by using 2 approaches. Firstly, to assess the predictive power of clustering algorithms, Entropy was introduced to measure the consistency between the clustering results from different algorithms and the known and validated functional classifications. Secondly, a modified method of figure of merit (adjust-FOM) was used as internal assessment method. In this method, one clustering algorithm was used to analyze all data but one experimental condition, the remaining condition was used to assess the predictive power of the resulting clusters. This method was applied on 3 gene expression data sets (2 from the Lyer's Serum Data Sets, and 1 from the Ferea's Saccharomyces
引用
收藏
页码:312 / 317
页数:6
相关论文
共 2 条
[1]   基于伪F统计量的模糊聚类方法在基因表达数据分析中的应用 [J].
易东 ;
张彦琦 ;
王文昌 ;
张蔚 ;
杨梦苏 ;
黄明辉 ;
方志俊 .
中国卫生统计, 2002, (03) :18-22
[2]   Mathematical programming in data mining [J].
Mangasarian, OL .
DATA MINING AND KNOWLEDGE DISCOVERY, 1997, 1 (02) :183-201