Scalable Data Partitioning Techniques for Distributed Data Processing in Cloud Environments: A Review

被引:1
作者
Ponnusamy, Sivakumar [1 ]
Gupta, Pankaj [2 ]
机构
[1] Cognizant Technol Solut US Corp, Richmond, VA 23233 USA
[2] Discover Financial Serv, Riverwoods, IL 60015 USA
关键词
Content-based partitioning; dynamic partitioning; graph-based data partitioning; hash partitioning; key-based partitioning; principal component analysis (PCA); range partitioning; BIG DATA;
D O I
10.1109/ACCESS.2024.3365810
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud storage allows individuals to store and access data from remote locations, providing the convenience of on-demand access to high-quality cloud applications. This eliminates the need for individuals to manage local hardware and software. The cloud storage system facilitates the efficient storage of data on cloud servers, allowing users to work with their data seamlessly without encountering resource constraints such as memory or storage limitations. Cloud computing is a technology that shows great promise owing to its ability to provide unlimited resources for computing and data storage services. These services are crucial for effectively managing the data according to specific requirements. In the current system, data is saved in the cloud using dynamic data operations and computations. This study explored the underlying principles of scalable data-partitioning techniques in the context of distributed data processing in cloud environments. The significance of this study lies in the increasing dependence of enterprises on cloud platforms for data-intensive tasks such as machine learning, data analytics, and real-time data processing. This study examines several data-partitioning strategies and methodologies developed to address the unique issues posed by cloud systems. The evaluation included an examination of their influence on the scalability, load distribution, and overall efficiency of the system. The main aim of this study is to enhance the domain of cloud-based data-processing techniques, thereby enabling enterprises to effectively leverage the full potential of the cloud for data-centric projects.
引用
收藏
页码:26735 / 26746
页数:12
相关论文
共 44 条
[1]   Improved MapReduce Load Balancing through Distribution-Dependent Hash Function Optimization [J].
Ahmad, Zafar ;
Duppala, Sharmila ;
Chowdhury, Rezaul ;
Skiena, Steven .
2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2020, :9-18
[2]  
Azzedin F, 2013, PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), P155
[3]   Label Propagation-Based Parallel Graph Partitioning for Large-Scale Graph Data [J].
Bae, Minho ;
Jeong, Minjoong ;
Oh, Sangyoon .
IEEE ACCESS, 2020, 8 :72801-72813
[4]   EOG-Based Reading Detection in the Wild Using Spectrograms and Nested Classification Approach [J].
Baray, Sriman Bidhan ;
Ahmed, Mosabber Uddin ;
Chowdhury, Muhammad E. H. ;
Kise, Koichi .
IEEE ACCESS, 2023, 11 :105619-105632
[5]   Static and Dynamic Big Data Partitioning on Apache Spark [J].
Bertolucci, Massimiliano ;
Carlini, Emanuele ;
Dazzi, Patrizio ;
Lulli, Alessandro ;
Ricci, Laura .
PARALLEL COMPUTING: ON THE ROAD TO EXASCALE, 2016, 27 :489-498
[6]  
Bharati RD, 2018, 2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA)
[7]  
Boehm M., 2019, Synthesis Lectures on Data Management, V14, P1, DOI DOI 10.2200/S00895ED1V01Y201901DTM057
[8]   Projected cross-view learning for unbalanced incomplete multi-view clustering [J].
Cai, Yiran ;
Che, Hangjun ;
Pan, Baicheng ;
Leung, Man-Fai ;
Liu, Cheng ;
Wen, Shiping .
INFORMATION FUSION, 2024, 105
[9]   SWARM: Adaptive Load Balancing in Distributed Streaming Systems for Big Spatial Data [J].
Daghistani, Anas ;
Aref, Walid G. ;
Ghafoor, Arif ;
Mahmood, Ahmed R. .
ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2021, 7 (03)
[10]  
Dipietro S, 2018, IEEE IFIP NETW OPER