Rough Set Methods for Attribute Clustering and Selection

被引:48
作者
Janusz, Andrzej [1 ]
Slezak, Dominik [1 ,2 ]
机构
[1] Univ Warsaw, Inst Math, PL-02097 Warsaw, Poland
[2] Infobright Inc, Warsaw, Poland
关键词
CLASSIFIERS; DIAGNOSIS;
D O I
10.1080/08839514.2014.883902
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study we investigate methods for attribute clustering and their possible applications to the task of computation of decision reducts from information systems. We focus on high-dimensional datasets, that is, microarray data. For this type of data, the traditional reduct construction techniques either can be extremely computationally intensive or can yield poor performance in terms of the size of the resulting reducts. We propose two reduct computation heuristics that combine the greedy search with a diverse selection of candidate attributes. Our experiments confirm that by proper grouping of similar-in some sense interchangeable-attributes, it is possible to significantly decrease computation time, as well as to increase a quality of the obtained reducts (i.e., to decrease their average size). We examine several criteria for attribute clustering, and we also identify so-called garbage clusters, which contain attributes that can be regarded as irrelevant.
引用
收藏
页码:220 / 242
页数:23
相关论文
共 27 条
[1]   Toward a gold standard for promoter prediction evaluation [J].
Abeel, Thomas ;
Van de Peer, Yves ;
Saeys, Yvan .
BIOINFORMATICS, 2009, 25 (12) :I313-I320
[2]  
Baldi P., 2002, DNA MICROARRAYS GENE
[3]  
Bazan JG, 2000, STUD FUZZ SOFT COMP, V56, P49
[4]  
Blaszczynski J, 2011, LECT NOTES ARTIF INT, V6954, P36, DOI 10.1007/978-3-642-24425-4_7
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[7]  
Fang JW, 2006, LECT NOTES COMPUT SC, V4029, P899, DOI 10.1007/11785231_94
[8]   Interactive gene clustering - A case study of breast cancer microarray data [J].
Gruzdz, A ;
Ihnatowicz, A ;
Slezak, D .
INFORMATION SYSTEMS FRONTIERS, 2006, 8 (01) :21-27
[9]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323
[10]   Dynamic rule-based similarity model for DNA microarray data [J].
Janusz, Andrzej .
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7255 LNCS :1-25