ArchaeoDAL: A Data Lake for Archaeological Data Management and Analytics

被引:2
作者
Liu, Pengfei [1 ]
Loudcher, Sabine [1 ]
Darmont, Jerome [1 ]
Nous, Camille [2 ]
机构
[1] Univ Lyon, Lyon 2, UR ERIC, Lyon, France
[2] Lab Cogitamus, Paris, France
来源
IDEAS 2021: 25TH INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM | 2021年
关键词
Data lake architecture; Data lake implementation; Metadata management; Archaeological data; Thesaurus; BIG DATA;
D O I
10.1145/3472163.3472266
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With new emerging technologies, such as satellites and drones, archaeologists collect data over large areas. However, it becomes difficult to process such data in time. Archaeological data also have many different formats (images, texts, sensor data) and can be structured, semi-structured and unstructured. Such variety makes data difficult to collect, store, manage, search and analyze effectively. A few approaches have been proposed, but none of them covers the full data lifecycle nor provides an efficient data management system. Hence, we propose the use of a data lake to provide centralized data stores to host heterogeneous data, as well as tools for data quality checking, cleaning, transformation and analysis. In this paper, we propose a generic, flexible and complete data lake architecture. Our metadata management system exploits goldMEDAL, which is the most generic metadata model currently available. Finally, we detail the concrete implementation of this architecture dedicated to an archaeological project.
引用
收藏
页码:252 / 262
页数:11
相关论文
共 15 条
  • [1] Alrehamy H, 2015, PROCEEDINGS 2015 IEEE FIFTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING BDCLOUD 2015, P160, DOI 10.1109/BDCloud.2015.62
  • [2] Architecture and prototype of a WLCG data lake for HL-LHC
    Bird, Ian
    Campana, Simone
    Girone, Maria
    Espinal, Xavier
    McCance, Gavin
    Schovancova, Jaroslava
    [J]. 23RD INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2018), 2019, 214
  • [3] Dixon J., 2010, Pentaho, hadoop, and data lakes
  • [4] Fang H, 2015, IEEE ANN INT CONF CY, P820, DOI 10.1109/CYBER.2015.7288049
  • [5] Gattiglia Gabriele., 2015, ARCH OLOGISCHE INFOR, V38, P113, DOI DOI 10.11588/AI.2015.1.26155
  • [6] Gorelik Alex, 2019, ARCHITECTING DATA LA, P133
  • [7] Inmon B., 2016, Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump, V1st
  • [8] Designing Data Governance
    Khatri, Vijay
    Brown, Carol V.
    [J]. COMMUNICATIONS OF THE ACM, 2010, 53 (01) : 148 - 152
  • [9] Geospatial Big Data and archaeology: Prospects and problems too great to ignore
    Mccoy, Mark D.
    [J]. JOURNAL OF ARCHAEOLOGICAL SCIENCE, 2017, 84 : 74 - 94
  • [10] Implementing big data lake for heterogeneous data sources
    Mehmood, Hassan
    Gilman, Ekaterina
    Cortes, Marta
    Kostakos, Panos
    Byrne, Andrew
    Valta, Katerina
    Tekes, Stavros
    Riekki, Jukka
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2019), 2019, : 37 - 44