Research on parallel data processing of data mining platform in the background of cloud computing

被引:1
|
作者
Bu, Lingrui [1 ]
Zhang, Hui [1 ]
Xing, Haiyan [1 ]
Wu, Lijun [1 ]
机构
[1] Shandong Lab Vocat & Tech Coll, Jinan 250022, Shandong, Peoples R China
关键词
Cloud computing; data mining; parallel processing; Hadoop platform; clustering algorithm; BIG DATA;
D O I
10.1515/jisys-2020-0113
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining.
引用
收藏
页码:479 / 486
页数:8
相关论文
共 50 条
  • [1] Cloud Computing Environments Parallel Data Mining Policy Research
    Lian, Wenwu
    Zhu, Xiaoshu
    Zhang, Jie
    Li, Shangfang
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (04): : 135 - 144
  • [2] Research on the Data Mining Based on Cloud Computing
    Luo, Laixi
    Zhu, Yu
    PROCEEDINGS OF 2020 CHINA MARKETING INTERNATIONAL CONFERENCE (WEB CONFERENCING): MARKETING AND MANAGEMENT IN THE DIGITAL AGE, 2020, : 494 - 505
  • [3] Research on data mining of electric power system based on Hadoop cloud computing platform
    Zhu J.
    International Journal of Computers and Applications, 2019, 41 (04) : 289 - 295
  • [4] Design and Implementation of Data Mining Platform Based on the Cloud Computing
    Zhu Jia
    Zhang Ping
    PROCEEDINGS OF 2014 IEEE WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS (WARTIA), 2014, : 163 - 165
  • [5] Construction of Big Data Mining Platform Based on Cloud Computing
    Sun, Mali
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING, 2015, 17 : 375 - 378
  • [6] Examination System in the Cloud Computing Platform based on Data Mining
    Li Xiao-Feng
    Wang Jian-Hua
    Gao Wei-Wei
    PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC), 2013, : 1605 - 1608
  • [7] Design and Implementation of a Data Mining Platform Based on Cloud Computing
    Nie, Jing
    AGRO FOOD INDUSTRY HI-TECH, 2017, 28 (01): : 318 - 321
  • [8] Research on the Big Data Cloud Computing Based on the Network Data Mining
    Zhang, Haiyang
    Zhang, Zhiwei
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 150 - 151
  • [9] Improved Parallel Data Mining Policy for Cloud Computing Environments
    Yu, Lili
    Ping, Jinzhen
    Wang, Qian
    Wang, Weifeng
    PROCEEDINGS OF THE 2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND AUTOMATION ENGINEERING, 2016, 42 : 414 - 418
  • [10] Research on the application of cloud computing in data mining algorithm
    Fang, Jia-Juan, 1600, TeknoScienze, Viale Brianza,22, Milano, 20127, Italy (28):