An Optimized Graph-based Clustering for Multi-database Mining

被引:1
作者
Miloudi, Salim [1 ]
Wang, Yulin [1 ]
Ding, Wenjia [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
来源
2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI) | 2020年
关键词
multi-database mining; frequent item-sets; graph clustering; dual gradient descent; convex optimization; CLASSIFICATION;
D O I
10.1109/ICTAI50040.2020.00128
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multinational corporations have multiple databases distributed throughout their branches, which store millions of transactions per day. For business applications, identifying disjoint clusters of homogeneous databases contributes to learning the common patterns among customers and also increases profits by targeting potential clients in the future. In this paper, we present an effective approach to search for the optimal clustering of multiple transactional databases in a weighted undirected similarity graph. To assess the clustering quality, we use dual gradient descent to minimize a constrained quasi-convex loss function whose parameters will determine the edges needed to form the optimal database clusters in the graph. Therefore, finding the global minimum is guaranteed in a finite and short time compared with the existing non-convex objectives where all possible candidate classifications are generated to find the ideal clustering. Moreover, our algorithm does not require specifying the number of clusters a priori and uses a disjoint-set forest data structure to maintain and keep track of the clusters as they are updated. We have performed extensive experiments on public data samples and compared our algorithm with one of the best previous algorithms for clustering multiple databases. Our experimental study shows that our algorithm performs better than the previous algorithms in terms of accuracy and running time.
引用
收藏
页码:807 / 812
页数:6
相关论文
共 15 条
[1]  
Adhikari A, 2015, INTEL SYST REF LIBR, V79, P305, DOI 10.1007/978-3-319-13212-9_15
[2]  
Agrawal R., 1994, P VLDB ENDOWMENT, P487
[3]  
[Anonymous], 2019, Uci machine learning repository: Data sets
[4]  
[Anonymous], 2019, FREQUENT ITEMSET MIN
[5]  
[Anonymous], 1988, NEURAL INFORM PROCES
[6]   Mining frequent patterns without candidate generation: A frequent-pattern tree approach [J].
Han, JW ;
Pei, J ;
Yin, YW ;
Mao, RY .
DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 8 (01) :53-87
[7]  
Li H, 2009, LECT NOTES COMPUT SC, V5598, P346
[8]  
Liu Yaqiong, 2013, J COMPUTATIONAL INFO
[9]   Contribution to Improve Database Classification Algorithms for Multi-Database Mining [J].
Miloudi, Salim ;
Rahal, Sid Ahmed ;
Khiat, Salim .
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (03) :709-726
[10]  
Ramkumar T., 2009, J APPL COMPUTER SCI, V3, P33