A new mining approach for uncertain databases using CUFP trees

被引:47
作者
Lin, Chun-Wei [1 ]
Hong, Tzung-Pei [1 ,2 ]
机构
[1] Natl Univ Kaohsiung, Dept Comp Sci & Informat Engn, Kaohsiung 811, Taiwan
[2] Natl Sun Yat Sen Univ, Dept Comp Sci & Engn, Kaohsiung 804, Taiwan
关键词
Data mining; CUFP tree; FP tree; Uncertain database; Existential probability; FREQUENT; GENERATION; ALGORITHM;
D O I
10.1016/j.eswa.2011.09.087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past, many algorithms have been proposed to mine frequent itemsets from transactional databases, in which the presence or absence of items in transactions was certainly known. In some applications, items may also be uncertain in transactions with their existential probabilities ranging from 0 to 1 in the uncertain dataset. Apparently, the processing in uncertain datasets is quite different from those in certain datasets. The UF-tree algorithm was proposed to construct the UF-tree structure from an uncertain dataset and mine frequent itemsets from the tree. In the UF-tree construction process, however, only the same items with the same existential probabilities in transactions were merged together in the tree, thus causing many redundant nodes in the tree. In this paper, a new tree structure called the compressed uncertain frequent-pattern tree (CUFP tree) is designed to efficiently keep the related information in the mining process. In the CUFP tree, the same items will be merged in a branch of the tree even when the existential probabilities in transactions are not the same. A mining algorithm called the CUFP-mine algorithm is then proposed based on the tree structure to find uncertain frequent patterns. Experimental results show that the proposed approach has a better performance than UF-tree algorithm both in the execution time and in the number of tree nodes. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:4084 / 4093
页数:10
相关论文
共 26 条
[1]   A tree projection algorithm for generation of frequent item sets [J].
Agarwal, RC ;
Aggarwal, CC ;
Prasad, VVV .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (03) :350-371
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]   DATABASE MINING - A PERFORMANCE PERSPECTIVE [J].
AGRAWAL, R ;
IMIELINSKI, T ;
SWAMI, A .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1993, 5 (06) :914-925
[4]  
Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[5]  
[Anonymous], 2003, Proc. of the ACM SIGMOD International Conference on Management of Data, DOI DOI 10.1145/872757
[6]  
[Anonymous], 2002, P 8 ACM SIGKDD INT C, DOI DOI 10.1145/775047.775081
[7]   TBAR:: An efficient method for association rule mining in relational databases [J].
Berzal, F ;
Cubero, JC ;
Marín, N ;
Serrano, JM .
DATA & KNOWLEDGE ENGINEERING, 2001, 37 (01) :47-64
[8]  
Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]
[9]   Perfect hashing schemes for mining association rules [J].
Chang, CC ;
Lin, CY .
COMPUTER JOURNAL, 2005, 48 (02) :168-179
[10]   Data mining: An overview from a database perspective [J].
Chen, MS ;
Han, JW ;
Yu, PS .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) :866-883