An Efficient Clustering Technique for Big Data Mining

被引:0
作者
Banait, Satish S. [1 ]
Sane, S. S. [1 ]
Talekar, Sopan A. [2 ]
机构
[1] SPPU, Dept Comp Engn, KKWIEER, Pune, Maharashtra, India
[2] SPPU Pune, NDMVPSs KBT Coll Engn, Dept Comp Engn, Pune, Maharashtra, India
来源
INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING | 2022年 / 13卷 / 03期
关键词
Big Data; Clustering technique; Unsupervised learning; Efficient clustering; K-means clustering; data mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data mining and big data analytics are approaches for analyzing data and extracting hidden information. Because big data is complicated and large in volume, traditional techniques to analysis and extraction do not function effectively. Data clustering is a common data mining approach that divides data into groups and makes it simple to extract information from them. Big data can include both organized and semi structured information, and it's becoming increasingly beneficial for companies. Examples include old organized database of inventory level, transactions, and consumer information, as well as non - structured comprehension from the internet, social media platforms, and embedded systems. Numerous schemes have been developed to reach the needed in relation to efficiency and effectiveness, and much study has been committed to Big Data analytics. Nevertheless, a few methodologies, such as clustering algorithms, require further research in regards to performance, usefulness, and other factors, leading to the development of a model which gives proper Big Data Analytics assessment and the impactful use of this methodology to retrieve relevant knowledge. We recorded and analyzed several big data sets in our proposed work, as well as discovered relevant current approaches. In this paper we proposed a new clustering technique using dimensionality reduction approach. For implementation of this work, we used real time streaming data in unstructured form and noisy sometimes. The proposed hybrid clustering techniques that improve the clustering accuracy as well as time for generate effective's clusters on large unstructured data. We confirm the findings by testing the suggested methodology on available information sets and comparing and analyzing the effectiveness of the developed system with that of current systems.
引用
收藏
页码:702 / 717
页数:16
相关论文
共 37 条
[1]  
Ajin VW, 2016, 2016 INTERNATIONAL CONFERENCE ON RESEARCH ADVANCES IN INTEGRATED NAVIGATION SYSTEMS (RAINS)
[2]  
ALRIFAI SS, 2020, PAPER REV DATA MININ
[3]  
Ankita Saldhi A. G., 2014, BIG DATA ANAL USING
[4]  
[Anonymous], 2015, INT C SOFT COMPUTING
[5]  
[Anonymous], 2020, IEEE, V1, P88
[6]  
[Anonymous], 2020, 3 INT C CONVERGENCE
[7]  
[Anonymous], 2017, INT C ADV ELECT COMM
[8]  
Anuradha G., 2014, INT C CIRC SYST COMM
[9]  
Arora S, 2014, 2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), P59, DOI 10.1109/CONFLUENCE.2014.6949256
[10]  
Bin N, 2018, PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND COMPUTER AIDED EDUCATION (ICISCAE 2018), P184, DOI 10.1109/ICISCAE.2018.8666889