LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes

被引:0
|
作者
Chai, Chengliang [1 ]
Deng, Yuhao [1 ]
Zhan, Yutong [1 ]
Cao, Ziqi [1 ]
Zhang, Yuanfang [1 ]
Cao, Lei [2 ]
Wang, Yuping [1 ]
Zhang, Zhiwei [1 ]
Yuan, Ye [1 ]
Wang, Guoren [1 ]
Tang, Nan [3 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Univ Arizona, MIT, Tempe, AZ USA
[3] HKUST, Guangzhou, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 12期
基金
国家重点研发计划;
关键词
D O I
10.14778/3685800.3685880
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Searching tables from poorly maintained data lakes has long been recognized as a formidable challenge in the realm of data management. There are three pivotal tasks: keyword-based, joinable and unionable table search, which form the backbone of tasks that aim to make sense of diverse datasets, such as machine learning. In this demo, we propose LakeCompass, an end-to-end prototype system that maintains abundant tabular data, supports all above search tasks with high efficacy, and well serves downstream ML modeling. To be specific, LakeCompass manages numerous real tables over which diverse types of indexes are built to support efficient search based on different user requirements. Particularly, LakeCompass could automatically integrate these discovered tables to improve the downstream model performance in an iterative approach. Finally, we provide both Python APIs and Web interface to facilitate flexible user interaction.
引用
收藏
页码:4381 / 4384
页数:4
相关论文
共 50 条
  • [1] TITAN: An end-to-end data analysis environment for the Hyperion™ imaging system
    Thirumal, Sindhura
    Jamzad, Amoon
    Cotechini, Tiziana
    Hindmarch, Charles T.
    Graham, Charles H.
    Siemens, D. Robert
    Mousavi, Parvin
    CYTOMETRY PART A, 2022, 101 (05) : 423 - 433
  • [2] The NOAO end-to-end data management system: An overview
    Smith, R. Chris
    Dickinson, Mark
    Lowry, Sonya
    Miller, Christopher J.
    Trueblood, Mark
    Valdes, Frank
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XVI, 2007, 376 : 615 - +
  • [3] Data analysis pipeline for EChO end-to-end simulations
    Ingo P. Waldmann
    E. Pascale
    Experimental Astronomy, 2015, 40 : 639 - 654
  • [4] SpatialOne: end-to-end analysis of visium data at scale
    Kamel, Mena
    Sarangi, Amrut
    Senin, Pavel
    Villordo, Sergio
    Sunaal, Mathew
    Barot, Het
    Wang, Seqian
    Solbas, Ana
    Cano, Luis
    Classe, Marion
    Bar-Joseph, Ziv
    Planas, Albert Pla
    BIOINFORMATICS, 2024, 40 (09)
  • [5] Data analysis pipeline for EChO end-to-end simulations
    Waldmann, Ingo P.
    Pascale, E.
    EXPERIMENTAL ASTRONOMY, 2015, 40 (2-3) : 639 - 654
  • [6] INODE: Building an End-to-End Data Exploration System in Practice
    Amer-Yahia, Sihem
    Koutrika, Georgia
    Braschler, Martin
    Calvanese, Diego
    Lanti, Davide
    Luecke-Tieke, Hendrik
    Mosca, Alessandro
    de Farias, Tarcisio Mendes
    Papadopoulos, Dimitris
    Patil, Yogendra
    Rull, Guillem
    Smith, Ellery
    Skoutas, Dimitrios
    Subramanian, Srividya
    Stockinger, Kurt
    SIGMOD RECORD, 2021, 50 (04) : 23 - 29
  • [7] Data Cyberinfrastructure for End-to-End Science
    Rodero, Ivan
    Parashar, Manish
    COMPUTING IN SCIENCE & ENGINEERING, 2020, 22 (05) : 60 - 70
  • [8] WIENER RESTORATION OF SAMPLED IMAGE DATA - END-TO-END ANALYSIS
    FALES, CL
    HUCK, FO
    MCCORMICK, JA
    PARK, SK
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1988, 5 (03): : 300 - 314
  • [9] Towards 'end-to-end' analysis and understanding of biological timecourse data
    Jena, Siddhartha G.
    Goglia, Alexander G.
    Engelhardt, Barbara E.
    BIOCHEMICAL JOURNAL, 2022, 479 (11) : 1257 - 1263
  • [10] END-TO-END SIMULATOR OF GEOSYNCHRONOUS SAR DATA FOR SYSTEM PERFORMANCE ASSESSMENT
    Giudici, Davide
    Leanza, Antonio
    Guarnieri, Andrea Monti
    Recchia, Andrea
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 5659 - 5662