Optimization of sub-query processing in distributed data integration systems

被引:10
作者
Chen, Gang [1 ]
Wu, Yongwei [1 ]
Liu, Jia [1 ]
Yang, Guangwen [1 ]
Zheng, Weimin [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
关键词
Cloud computing; Grid computing; Data integration; Query; Data flow;
D O I
10.1016/j.jnca.2010.06.007
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data integration system (DIS) is becoming paramount when Cloud/Grid applications need to integrate and analyze data from geographically distributed data sources. DIS gathers data from multiple remote sources, integrates and analyzes the data to obtain a query result. As Clouds/Grids are distributed over wide-area networks, communication cost usually dominates overall query response time. Therefore we can expect that query performance can be improved by minimizing communication cost. In our method, DIS uses a data flow style query execution model. Each query plan is mapped to a group of mu Engines, each of which is a program corresponding to a particular operator. Thus, multiple sub-queries from concurrent queries are able to share mu Engines. We reconstruct these sub-queries to exploit overlapping data among them. As a result, all the sub-queries can obtain their results, and overall communication overhead can be reduced. Experimental results show that, when DIS runs a group of parameterized queries, our reconstructing algorithm can reduce the average query completion time by 32-48%; when DIS runs a group of non-parameterized queries, the average query completion time of queries can be reduced by 25-35%. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1035 / 1042
页数:8
相关论文
共 50 条
  • [21] CONFLUENCE: Adaptive Spatiotemporal Data Integration Using Distributed Query Relaxation Over Heterogeneous Observational Datasets
    Mitra, Saptashwa
    Pallickara, Sangmi Lee
    [J]. 2018 IEEE/ACM 11TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2018, : 184 - 193
  • [22] Join and multi-join processing in data integration systems
    Tan, KL
    Eng, PK
    Ooi, BC
    Zhang, M
    [J]. DATA & KNOWLEDGE ENGINEERING, 2002, 40 (02) : 217 - 239
  • [23] A Distributed Query Method for RDF Data on Spark
    Guo, Minru
    Wang, Jingbin
    [J]. BIG DATA TECHNOLOGY AND APPLICATIONS, 2016, 590 : 102 - 115
  • [24] Query Decomposition Strategy for Integration of Semistructured Data
    Handoko
    Getta, J. R.
    [J]. 16TH INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS 2014), 2014, : 459 - 463
  • [25] Query Optimization Based on Data Provenance
    Huang Li
    Cheng Hongbing
    [J]. NEW TRENDS AND APPLICATIONS OF COMPUTER-AIDED MATERIAL AND ENGINEERING, 2011, 186 : 586 - 590
  • [26] An adaptive cost model for distributed query optimization on the grid
    Slimani, Y
    Najjar, F
    Mami, N
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2004: OTM 2004 WORKSHOPS, PROCEEDINGS, 2004, 3292 : 79 - 87
  • [27] STATIC OPTIMIZATION OF DATA INTEGRATION PLANS IN GLOBAL INFORMATION SYSTEMS
    Getta, Janusz R.
    [J]. ICEIS 2011: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1, 2011, : 141 - 150
  • [28] Data Asset by Query Processing In Client and Server
    Prasanna, E.
    Gunasekaran, G.
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2016,
  • [29] MapReduce skyline query processing with partitioning and distributed dominance tests
    Koh, Jia-Ling
    Chen, Chia-Ching
    Chan, Chih-Yu
    Chen, Arbee L. P.
    [J]. INFORMATION SCIENCES, 2017, 375 : 114 - 137
  • [30] Uncertain top-k query processing in distributed environments
    Xite Wang
    Derong Shen
    Ge Yu
    [J]. Distributed and Parallel Databases, 2016, 34 : 567 - 589