A Gradient-Based Clustering for Multi-Database Mining

被引:3
|
作者
Miloudi, Salim [1 ]
Wang, Yulin [1 ]
Ding, Wenjia [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
关键词
Databases; Itemsets; Clustering algorithms; Data models; Prototypes; Computer science; Computational modeling; Multi-database mining; graph clustering; dual gradient descent; quasi-convex optimization; similarity measure; HIGH-FREQUENCY RULES; INTERESTING PATTERNS; ITEM RECOMMENDATION; ALGORITHMS; CLASSIFICATION;
D O I
10.1109/ACCESS.2021.3050404
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multinational corporations have multiple databases distributed throughout their branches, which store millions of transactions per day. For business applications, identifying disjoint clusters of similar and relevant databases contributes to learning the common buying patterns among customers and also increases the profits by targeting potential clients in the future. This process is called clustering, which is an important unsupervised technique for big data mining. In this article, we present an effective approach to search for the optimal clustering of multiple transaction databases in a weighted undirected similarity graph. To assess the clustering quality, we use dual gradient descent to minimize a constrained quasi-convex loss function whose parameters will determine the edges needed to form the optimal database clusters in the graph. Therefore, finding the global minimum is guaranteed in a finite and short time compared with the existing non-convex objectives where all possible candidate clusterings are generated to find the ideal clustering. Moreover, our algorithm does not require specifying the number of clusters a priori and uses a disjoint-set forest data structure to maintain and keep track of the clusters as they are updated. Through a series of experiments on public data samples and precomputed similarity matrices, we show that our algorithm is more accurate and faster in practice than the existing clustering algorithms for multi-database mining.
引用
收藏
页码:11144 / 11172
页数:29
相关论文
共 50 条
  • [1] An Optimized Graph-based Clustering for Multi-database Mining
    Miloudi, Salim
    Wang, Yulin
    Ding, Wenjia
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 807 - 812
  • [2] An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining
    Miloudi, Salim
    Wang, Yulin
    Ding, Wenjia
    ENTROPY, 2021, 23 (05)
  • [3] Database classification for multi-database mining
    Wu, XD
    Zhang, CQ
    Zhang, SC
    INFORMATION SYSTEMS, 2005, 30 (01) : 71 - 88
  • [4] An Improved Database Classification Algorithm for Multi-database Mining
    Li, Hong
    Hu, XueGang
    Zhang, YanMing
    FRONTIERS IN ALGORITHMICS, PROCEEDINGS, 2009, 5598 : 346 - +
  • [5] Peculiarity oriented multi-database mining
    Zhong, N
    Yao, YY
    Ohsuga, S
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 1704 : 136 - 146
  • [6] Mining Global Exceptional Rules in Multi-database
    Dong, Xiangjun
    Shang, Shiju
    Li, Jie
    Jiang, He
    2009 INTERNATIONAL FORUM ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 2, PROCEEDINGS, 2009, : 680 - +
  • [7] Synthesizing Global Negative Association Rules in Multi-Database Mining
    Ramkumar, Thirunavukkarasu
    Hariharan, Shanmugasundaram
    Selvamuthukumaran, Shanmugam
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2014, 11 (06) : 526 - 531
  • [8] Contribution to Improve Database Classification Algorithms for Multi-Database Mining
    Miloudi, Salim
    Rahal, Sid Ahmed
    Khiat, Salim
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (03): : 709 - 726
  • [9] Enhancing quality of knowledge synthesized from multi-database mining
    Adhikari, Animesh
    Rao, P. R.
    PATTERN RECOGNITION LETTERS, 2007, 28 (16) : 2312 - 2324
  • [10] A Novel Mining Method of Global Negative Association Rules in Multi-Database
    Li, Hong
    Shen, Yijun
    Hu, Xuegang
    2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 1, 2009, : 392 - +