Mining Top-K Frequent Closed Patterns from Gene Expression Data

被引:1
作者
Ji, Shufan [1 ]
Wang, Xuejiao [1 ]
Zong, Yi [1 ]
Gao, Xiaopeng [1 ]
机构
[1] Beihang Univ, Comp Coll, Beijing, Peoples R China
来源
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW) | 2014年
关键词
D O I
10.1109/ICDMW.2014.61
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Analyzing microarray gene expression data provides biologists deep insights into gene functions and gene regulatory network. In this paper, we propose a novel efficient algorithm FCPminer to mine top-k frequent closed patterns (FCPs) of higher support with length no less than minL from gene expression data. FCPminer employs a prefix fp-tree data structure, with top-down best first search strategy, such that FCPs of adequate length with highest supports are firstly mined. Compared with existing top-k FCP mining algorithms, FCPminer is much more efficient as it avoids expanding nodes with inadequate length (less than minL) or low support (ranked below top-k) during mining process. In addition, FCPminer further improves mining efficiency by employing a hash-based closedness checking method. Experimental results on real biological and synthetic data show that our proposed FCPminer outperforms existing state-of-the-art algorithms with high efficiency, especially for large and dense datasets.
引用
收藏
页码:732 / 739
页数:8
相关论文
共 21 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Agrawal R., P 20 INT C VERY LARG
[3]  
[Anonymous], 2000, SIGMOD INT WORKSHOP
[4]  
Besson R, 2004, LECT NOTES ARTIF INT, V3056, P615
[5]  
Chuang K., 2008, VLDB J
[6]   Mining frequent closed patterns in microarray data [J].
Cong, G ;
Tan, KL ;
Tung, AKH ;
Pan, F .
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :363-366
[7]  
Han JW, 2000, SIGMOD RECORD, V29, P1
[8]   Compressed hierarchical mining of frequent closed patterns from dense data sets [J].
Ji, Liping ;
Tan, Kian-Lee ;
Tung, Anthony K. H. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (09) :1175-1187
[9]   Pincer-search: An efficient algorithm for discovering the maximum frequent set [J].
Lin, DI ;
Kedem, ZM .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (03) :553-566
[10]  
Nataraj R. V., 2009, P 2 BANG ANN COMP C