Parallel mining of uncertain data using segmentation of data set area and Voronoi diagrams

被引:0
|
作者
Lukic, Ivica [1 ]
Hocenski, Zeljko [1 ]
Kohler, Mirko [1 ]
Galba, Tomislav [1 ]
机构
[1] Josip Juraj Strossmayer Univ Osijek, Fac Elect Engn Comp Sci & Informat Technol Osijek, Dept Comp Engn & Automat, Osijek, Croatia
关键词
Clustering algorithms; data mining; data uncertainty; Euclidean distance; parallel algorithms;
D O I
10.1080/00051144.2018.1541645
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering of uncertain objects in large uncertain databases and problem of mining uncertain data has been well studied. In this paper, clustering of uncertain objects with location uncertainty is studied. Moving objects, like mobile devices, report their locations periodically, thus their locations are uncertain and best described by a probability density function. The number of objects in a database can be large which makes the process of mining accurate data, a challenging and time consuming task. Authors will give an overview of existing clustering methods and present a new approach for data mining and parallel computing of clustering problems. All existing methods use pruning to avoid expected distance calculations. It is required to calculate the expected distance numerical integration, which is time-consuming. Therefore, a new method, called Segmentation of Data Set Area-Parallel, is proposed. In this method, a data set area is divided into many small segments. Only clusters and objects in that segment are observed. The number of segments is calculated using the number and location of clusters. The use of segments gives the possibility of parallel computing, because segments are mutually independent. Thus, each segment can be computed on multiple cores.
引用
收藏
页码:349 / 356
页数:8
相关论文
共 50 条
  • [21] Research On Large outliers in the data set data mining algorithm
    Zhang, Jinhai
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS AND COMPUTING TECHNOLOGY, 2016, 60 : 1743 - 1747
  • [22] Disease Influence Measure Based Diabetic Prediction with Medical Data Set Using Data Mining
    Baiju, B. V.
    Aravindhar, D. John
    PROCEEDINGS OF 2019 1ST INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION AND COMMUNICATION TECHNOLOGY (ICIICT 2019), 2019,
  • [23] Discovery of Hidden Patterns in Breast Cancer Patients, Using Data Mining on a Real Data Set
    Atashi, Alireza
    Tohidinezhad, Fariba
    Dorri, Sara
    Nazeri, Najmeh
    Ghousi, Rouzbeh
    Marashi, Sina
    Hajialiasgari, Fatemeh
    HEALTH INFORMATICS VISION: FROM DATA VIA INFORMATION TO KNOWLEDGE, 2019, 262 : 142 - 145
  • [24] Data mining on parallel database systems
    Sousa, M
    Mattoso, M
    Ebecken, N
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 1147 - 1154
  • [25] Mining time series data for segmentation by using Ant Colony Optimization
    Weng, Sung-Shun
    Liu, Yuan-Hung
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 173 (03) : 921 - 937
  • [26] Hyper-structure mining of frequent patterns in uncertain data streams
    HewaNadungodage, Chandima
    Xia, Yuni
    Lee, Jaehwan John
    Tu, Yi-cheng
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 37 (01) : 219 - 244
  • [27] Effect of data skewness and workload balance in parallel data mining
    Cheung, DW
    Lee, SD
    Xiao, YQ
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (03) : 498 - 514
  • [28] Hyper-structure mining of frequent patterns in uncertain data streams
    Chandima HewaNadungodage
    Yuni Xia
    Jaehwan John Lee
    Yi-cheng Tu
    Knowledge and Information Systems, 2013, 37 : 219 - 244
  • [29] IMPROVED BISECTOR CLUSTERING OF UNCERTAIN DATA USING SDSA METHOD ON PARALLEL PROCESSORS
    Lukic, Ivica
    Slavek, Ninoslav
    Koehler, Mirko
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2013, 20 (02): : 255 - 261
  • [30] Rough set extension of Tcl for data mining
    Griffin, G
    Chen, Z
    KNOWLEDGE-BASED SYSTEMS, 1998, 11 (3-4) : 249 - 253