Scalable Data Partitioning Techniques for Distributed Data Processing in Cloud Environments: A Review

被引：1

作者：

Ponnusamy, Sivakumar ^{[1
]}

Gupta, Pankaj ^{[2
]}

机构：

[1] Cognizant Technol Solut US Corp, Richmond, VA 23233 USA

[2] Discover Financial Serv, Riverwoods, IL 60015 USA

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Content-based partitioning; dynamic partitioning; graph-based data partitioning; hash partitioning; key-based partitioning; principal component analysis (PCA); range partitioning; BIG DATA;

D O I：

10.1109/ACCESS.2024.3365810

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cloud storage allows individuals to store and access data from remote locations, providing the convenience of on-demand access to high-quality cloud applications. This eliminates the need for individuals to manage local hardware and software. The cloud storage system facilitates the efficient storage of data on cloud servers, allowing users to work with their data seamlessly without encountering resource constraints such as memory or storage limitations. Cloud computing is a technology that shows great promise owing to its ability to provide unlimited resources for computing and data storage services. These services are crucial for effectively managing the data according to specific requirements. In the current system, data is saved in the cloud using dynamic data operations and computations. This study explored the underlying principles of scalable data-partitioning techniques in the context of distributed data processing in cloud environments. The significance of this study lies in the increasing dependence of enterprises on cloud platforms for data-intensive tasks such as machine learning, data analytics, and real-time data processing. This study examines several data-partitioning strategies and methodologies developed to address the unique issues posed by cloud systems. The evaluation included an examination of their influence on the scalability, load distribution, and overall efficiency of the system. The main aim of this study is to enhance the domain of cloud-based data-processing techniques, thereby enabling enterprises to effectively leverage the full potential of the cloud for data-centric projects.

引用

页码：26735 / 26746

页数：12

共 44 条

[11] Combining t-Distributed Stochastic Neighbor Embedding With Convolutional Neural Networks for Hyperspectral Image Classification [J].

Gao, Lianru ;

Gu, Daixin ;

Zhuang, Lina ;

Ren, Jinchang ;

Yang, Dong ;

Zhang, Bing .

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (08) :1368-1372

[12] MapReduce: Review and open challenges [J].

Hashem, Ibrahim Abaker Targio ;

Anuar, Nor Badrul ;

Gani, Abdullah ;

Yaqoob, Ibrar ;

Xia, Feng ;

Khan, Samee Ullah .

SCIENTOMETRICS, 2016, 109 (01) :389-422

[13] A Cost-Effective Distribution-Aware Data Replication Scheme for Parallel I/O Systems [J].

He, Shuibing ;

Sun, Xian-He .

IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (10) :1374-1387

[14] Continuously Bulk Loading over Range Partitioned Tables for Large Scale Historical Data [J].

He, Xiaolong ;

Cai, Peng ;

Zhou, Xuan ;

Zhou, Aoying .

2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, :960-971

[15]

Ho R., 2013, Bachelor disserta-tion

[16]

Islam NS, 2012, INT CONF HIGH PERFOR

[17] Big Data Processing in Cloud Computing Environments [J].

Ji, Changqing ;

Li, Yu ;

Qiu, Wenming ;

Awada, Uchechukwu ;

Li, Keqiu .

PROCEEDINGS OF THE 2012 12TH INTERNATIONAL SYMPOSIUM ON PERVASIVE SYSTEMS, ALGORITHMS, AND NETWORKS (I-SPAN 2012), 2012, :17-23

[18]

Jinhui Yao, 2010, Proceedings 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), P600, DOI 10.1109/CCGRID.2010.17

[19]

Karun AK, 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT 2013), P132

[20] Directory-Based Dependency Processing for Software Architecture Recovery [J].

Kong, Xianglong ;

Li, Bixin ;

Wang, Lulu ;

Wu, Wensheng .

IEEE ACCESS, 2018, 6 :52321-52335

← 1 2 3 4 5 →