LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes

被引:0
|
作者
Chai, Chengliang [1 ]
Deng, Yuhao [1 ]
Zhan, Yutong [1 ]
Cao, Ziqi [1 ]
Zhang, Yuanfang [1 ]
Cao, Lei [2 ]
Wang, Yuping [1 ]
Zhang, Zhiwei [1 ]
Yuan, Ye [1 ]
Wang, Guoren [1 ]
Tang, Nan [3 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Univ Arizona, MIT, Tempe, AZ USA
[3] HKUST, Guangzhou, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 12期
基金
国家重点研发计划;
关键词
D O I
10.14778/3685800.3685880
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Searching tables from poorly maintained data lakes has long been recognized as a formidable challenge in the realm of data management. There are three pivotal tasks: keyword-based, joinable and unionable table search, which form the backbone of tasks that aim to make sense of diverse datasets, such as machine learning. In this demo, we propose LakeCompass, an end-to-end prototype system that maintains abundant tabular data, supports all above search tasks with high efficacy, and well serves downstream ML modeling. To be specific, LakeCompass manages numerous real tables over which diverse types of indexes are built to support efficient search based on different user requirements. Particularly, LakeCompass could automatically integrate these discovered tables to improve the downstream model performance in an iterative approach. Finally, we provide both Python APIs and Web interface to facilitate flexible user interaction.
引用
收藏
页码:4381 / 4384
页数:4
相关论文
共 50 条
  • [31] End-to-End Data Paths: Quickest or Most Reliable?
    Xue, Guoliang
    IEEE COMMUNICATIONS LETTERS, 1998, 2 (06) : 156 - 158
  • [32] Demonstration of End-to-End Automation of DNA Data Storage
    Takahashi, Christopher N.
    Nguyen, Bichlien H.
    Strauss, Karin
    Ceze, Luis
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [33] HYPERSPECTRAL DATA PROCESSING: AN OPPORTUNITY FOR END-TO-END PROCESSING
    Cole, Marge
    Wilson, Anne
    Little, Michael
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 6328 - 6331
  • [34] Data networking: an end-to-end solution with Alcatel products
    Alcatel Network Systems
    Alcatel Telecommun Rev, 4 th Quarter (257-264):
  • [35] BROADTALK - END-TO-END COMMUNICATIONS WITH DATA-BROADCASTING
    BARRAT, J
    COMPUTER COMMUNICATIONS, 1991, 14 (01) : 53 - 54
  • [36] Constructing end-to-end models using ECOPATH data
    Steele, John H.
    Ruzicka, James J.
    JOURNAL OF MARINE SYSTEMS, 2011, 87 (3-4) : 227 - 238
  • [37] An End-to-End Secure Solution for IoMT Data Exchange
    El Jaouhari, Saad
    Tamani, Nouredine
    APPLIED CRYPTOGRAPHY AND NETWORK SECURITY WORKSHOPS, PT I, ACNS 2024-AIBLOCK 2024, AIHWS 2024, AIOTS 2024, SCI 2024, AAC 2024, SIMLA 2024, LLE 2024, AND CIMSS 2024, 2024, 14586 : 3 - 15
  • [38] End-to-End Scientific Data Management using Workflows
    Simmhan, Yogesh
    IEEE CONGRESS ON SERVICES 2008, PT I, PROCEEDINGS, 2008, : 472 - 473
  • [39] Data Augmentation for End-to-End Optical Music Recognition
    Lopez-Gutierrez, Juan C.
    Valero-Mas, Jose J.
    Castellanos, Francisco J.
    Calvo-Zaragoza, Jorge
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 59 - 73
  • [40] MOZAIK: An End-to-End Secure Data Sharing Platform
    Abidin, Aysajan
    Marquet, Enzo
    Moeyersons, Jerico
    Limani, Xhulio
    Pohle, Erik
    Van Kenhove, Michiel
    Marquez-Barja, Johann M.
    Slamnik-Krijestorac, Nina
    Volckaert, Bruno
    PROCEEDINGS OF THE 2ND ACM DATA ECONOMY WORKSHOP, DEC 2023, 2023, : 34 - 40