Data Source Selection Based on an Improved Greedy Genetic Algorithm

被引:5
|
作者
Yang, Jian [1 ]
Xing, Chunxiao [2 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China
[2] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Res Inst Informat, Inst Internet Ind,Dept Comp Sci & Technol, Beijing 100084, Peoples R China
来源
SYMMETRY-BASEL | 2019年 / 11卷 / 02期
关键词
data integration; quality; source selection; improved greedy genetic algorithm (IGGA); KNAPSACK-PROBLEM;
D O I
10.3390/sym11020273
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The development of information technology has led to a sharp increase in data volume. The tremendous amount of data has become a strategic capital that allows businesses to derive superior market intelligence or improve existing operations. People expect to consolidate and utilize data as much as possible. However, too much data will bring huge integration cost, such as the cost of purchasing and cleaning. Therefore, under the context of limited resources, obtaining more data integration value is our expectation. In addition, the uneven quality of data sources make the multi-source selection task more difficult, and low-quality data sources can seriously affect integration results without the desired quality gain. In this paper, we have studied how to balance data gain and cost in the source selection, specifically, maximizing the gain of data on the premise of a given budget. We proposed an improved greedy genetic algorithm (IGGA) to solve the problem of source selection, and carried out a wide range of experimental evaluations on the real and synthetic dataset. The empirical results show considerable performance in favor of the proposed algorithm in terms of solution quality.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A coin selection strategy based on the greedy and genetic algorithm
    Xuelin Wei
    Chang Wu
    Haoran Yu
    Siyan Liu
    Yihong Yuan
    Complex & Intelligent Systems, 2023, 9 : 421 - 434
  • [2] A coin selection strategy based on the greedy and genetic algorithm
    Wei, Xuelin
    Wu, Chang
    Yu, Haoran
    Liu, Siyan
    Yuan, Yihong
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (01) : 421 - 434
  • [3] A Network Selection Algorithm Based on Improved Genetic Algorithm
    Chen, Juanmin
    Zhang, Damin
    Liu, Dong
    Pan, Zhiyan
    2018 IEEE 18TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2018, : 209 - 214
  • [4] An Improved Greedy Algorithm for Subset Selection in Linear Estimation
    Dutta, Shamak
    Wilde, Nils
    Smith, Stephen L.
    2022 EUROPEAN CONTROL CONFERENCE (ECC), 2022, : 1067 - 1072
  • [5] A Random Search and Greedy Selection based Genetic Quantum Algorithm for Combinatorial Optimization
    Pavithr, R. S.
    Gursaran
    2013 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2013, : 2422 - 2427
  • [6] Web Service Selection Based on Improved Genetic Algorithm
    Lin, Yi
    Yang, Yi
    Li, Lian
    Wang, Junling
    Zhao, Chenyang
    Guo, Wenqiang
    COMMUNICATIONS AND INFORMATION PROCESSING, PT 2, 2012, 289 : 564 - +
  • [7] Hybrid feature selection based on improved genetic algorithm
    Hu, B. (hubin@njau.edu.cn), 1725, Universitas Ahmad Dahlan (11):
  • [8] The optimization selection of tests based on greedy algorithm
    Liu, Jian-Min
    Liu, Yuan-Hong
    Feng, Fu-Zhou
    Jiang, Peng-Cheng
    Binggong Xuebao/Acta Armamentarii, 2014, 35 (12): : 2109 - 2115
  • [9] Improved Fitness Proportionate Selection-Based Genetic Algorithm
    Yu Fengrui
    Fu Xueliang
    Li Honghui
    Dong Gaifang
    PROCEEDINGS OF THE 2016 3RD INTERNATIONAL CONFERENCE ON MECHATRONICS AND INFORMATION TECHNOLOGY (ICMIT), 2016, 49 : 136 - 140
  • [10] FEATURE SELECTION FOR IMBALANCED DATASETS BASED ON IMPROVED GENETIC ALGORITHM
    Du, Limin
    Xu, Yang
    Jin, Liuqian
    DECISION MAKING AND SOFT COMPUTING, 2014, 9 : 119 - 124