JQPro:Join Query Processing in a Distributed System for Big RDF Data Using the Hash-Merge Join Technique

被引:1
作者
Elzein, Nahla Mohammed [1 ]
Majid, Mazlina Abdul [2 ]
Hashem, Ibrahim Abaker Targio [3 ]
Ibrahim, Ashraf Osman [4 ,5 ]
Abulfaraj, Anas W. [6 ]
Binzagr, Faisal [7 ]
机构
[1] Future Univ, Fac Comp Sci, Khartoum 10553, Sudan
[2] Univ Malaysia Pahang, Fac Comp, Pekan 26600, Malaysia
[3] Univ Sharjah, Coll Comp & Informat, Dept Comp Sci, Sharjah, U Arab Emirates
[4] Univ Malaysia Sabah, Fac Comp & Informat, Data Sci Programme, Kota Kinabalu 88400, Malaysia
[5] Univ Malaysia Sabah, Adv Machine Intelligence Res Grp, Kota Kinabalu 88400, Malaysia
[6] King Abdulaziz Univ, Dept Informat Syst, POB 344, Rabigh 21911, Saudi Arabia
[7] King Abdulaziz Univ, Dept Comp Sci, POB 344, Rabigh 21911, Saudi Arabia
关键词
semantic web; distributed computing; RDF; big data; SPARKSQL; FRAMEWORK;
D O I
10.3390/math11051275
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In the last decade, the volume of semantic data has increased exponentially, with the number of Resource Description Framework (RDF) datasets exceeding trillions of triples in RDF repositories. Hence, the size of RDF datasets continues to grow. However, with the increasing number of RDF triples, complex multiple RDF queries are becoming a significant demand. Sometimes, such complex queries produce many common sub-expressions in a single query or over multiple queries running as a batch. In addition, it is also difficult to minimize the number of RDF queries and processing time for a large amount of related data in a typical distributed environment encounter. To address this complication, we introduce a join query processing model for big RDF data, called JQPro. By adopting a MapReduce framework in JQPro, we developed three new algorithms, which are hash-join, sort-merge, and enhanced MapReduce-join for join query processing of RDF data. Based on an experiment conducted, the result showed that the JQPro model outperformed the two popular algorithms, gStore and RDF-3X, with respect to the average execution time. Furthermore, the JQPro model was also tested against RDF-3X, RDFox, and PARJs using the LUBM benchmark. The result showed that the JQPro model had better performance in comparison with the other models. In conclusion, the findings showed that JQPro achieved improved performance with 87.77% in terms of execution time. Hence, in comparison with the selected models, JQPro performs better.
引用
收藏
页数:20
相关论文
共 46 条
[1]   SW-Store: a vertically partitioned DBMS for Semantic Web data management [J].
Abadi, Daniel J. ;
Marcus, Adam ;
Madden, Samuel R. ;
Hollenbach, Kate .
VLDB JOURNAL, 2009, 18 (02) :385-406
[2]   A Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data [J].
Abdelaziz, Ibrahim ;
Harbi, Razen ;
Khayyat, Zuhair ;
Kalnis, Panos .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (13) :2049-2060
[3]   An Intelligent Metaheuristic Binary Pigeon Optimization-Based Feature Selection and Big Data Classification in a MapReduce Environment [J].
Abukhodair, Felwa ;
Alsaggaf, Wafaa ;
Jamal, Amani Tariq ;
Abdel-Khalek, Sayed ;
Mansour, Romany F. .
MATHEMATICS, 2021, 9 (20)
[4]  
[Anonymous], 2008, SPARQL QUERY LANGUAG
[5]   Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark [J].
Azhir, Elham ;
Hosseinzadeh, Mehdi ;
Khan, Faheem ;
Mosavi, Amir .
MATHEMATICS, 2022, 10 (19)
[6]  
Bilidas D., 2019, P 22 INT C EXTENDING
[7]   TripleID-Q: RDF Query Processing Framework Using GPU [J].
Chantrapornchai, Chantana ;
Choksuchat, Chidchanok .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (09) :2121-2135
[8]  
Chawla T., 2018, P 2018 9 INT C COMP, P1
[9]   MuSe: a multi-level storage scheme for big RDF data using MapReduce [J].
Chawla, Tanvi ;
Singh, Girdhari ;
Pilli, Emmanuel S. .
JOURNAL OF BIG DATA, 2021, 8 (01)
[10]  
Choi P., 2013, RDFCHAIN CHAIN CENTR