An efficient and scalable SPARQL query processing framework for big data using MapReduce and hybrid optimum load balancing

被引:0
作者
Kumar, V. Naveen [1 ]
Kumar, P. S. Ashok [2 ]
机构
[1] Visvesvaraya Technol Univ, Don Bosco Inst Technol, Bengaluru 560074, Karnataka, India
[2] Visvesvaraya Technol Univ, ACS Coll Engn, Dept CSE, Bengaluru 560074, Karnataka, India
关键词
RDF data storage; SPARQL querying; Hadoop; Extended vertical partitioning; Hybrid optimum load balancing; RDF DATA;
D O I
10.1016/j.datak.2023.102239
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The increasing RDF (Resource Description Framework) data volume requires a Hadoop platform for processing queries over large datasets. In this work, SPARQL (Simple Protocol and Rdf Query Language) queries are evaluated with Hadoop based on the objective of minimizing the number of joins through data partitioning for performing map/reduce jobs. The query evaluation time and the number of cross node joins are minimized with the proposed partitioning techniques. Extended vertical partitioning is proposed for distributed data stores based on objects' explicit information for splitting predicates. For accessing the RDF data, hybrid monarch butterfly with beetle swarm load balancing optimization with Map-reduce (Hybrid Optimum Load Balancing) is applied. The proposed SPARQL query processing is evaluated over large RDF datasets. The proposed approach's evaluation results are analyzed with the existing approaches, indicating the proposed framework's efficiency. By using the proposed approach, an accuracy of 97 % is obtained.
引用
收藏
页数:15
相关论文
共 28 条
[1]   A Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data [J].
Abdelaziz, Ibrahim ;
Harbi, Razen ;
Khayyat, Zuhair ;
Kalnis, Panos .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (13) :2049-2060
[2]   Online approximative SPARQL query processing for COUNT-DISTINCT queries with web preemption [J].
Aimonier-Davat, Julien ;
Skaf-Molli, Hala ;
Molli, Pascal ;
Grall, Arnaud ;
Minier, Thomas .
SEMANTIC WEB, 2022, 13 (04) :735-755
[3]  
Al-Ghezi A.I.A., 2021, Ph.D. thesis
[4]   MuSe: a multi-level storage scheme for big RDF data using MapReduce [J].
Chawla, Tanvi ;
Singh, Girdhari ;
Pilli, Emmanuel S. .
JOURNAL OF BIG DATA, 2021, 8 (01)
[5]   Intelligent SPARQL Query Generation for Natural Language Processing Systems [J].
Chen, Yi-Hui ;
Lu, Eric Jui-Lin ;
Ou, Ting-An .
IEEE ACCESS, 2021, 9 :158638-158650
[6]   Experimenting with big data computing for scaling data quality-aware query processing [J].
Cisneros-Cabrera, Sonia ;
Michailidou, Anna-Valentini ;
Sampaio, Sandra ;
Sampaio, Pedro ;
Gounaris, Anastasios .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178
[7]   JQPro:Join Query Processing in a Distributed System for Big RDF Data Using the Hash-Merge Join Technique [J].
Elzein, Nahla Mohammed ;
Majid, Mazlina Abdul ;
Hashem, Ibrahim Abaker Targio ;
Ibrahim, Ashraf Osman ;
Abulfaraj, Anas W. ;
Binzagr, Faisal .
MATHEMATICS, 2023, 11 (05)
[8]   Managing big RDF data in clouds: Challenges, opportunities, and solutions [J].
Elzein, Nahla Mohammed ;
Majid, Mazlina Abdul ;
Hashem, Ibrahim Abaker Targio ;
Yaqoob, Ibrar ;
Alaba, Fadele Ayotunde ;
Imran, Muhammad .
SUSTAINABLE CITIES AND SOCIETY, 2018, 39 :375-386
[9]   RDFPartSuite: Bridging Physical and Logical RDF Partitioning [J].
Galicia, Jorge ;
Mesmoudi, Amin ;
Bellatreche, Ladjel .
BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2019, 2019, 11708 :136-150
[10]   Leon: A Distributed RDF Engine for Multi-query Processing [J].
Guo, Xintong ;
Gao, Hong ;
Zou, Zhaonian .
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT I, 2019, 11446 :742-759