Heterogeneous biological data integration with declarative query language

被引:4
作者
Nguyen, H. [1 ]
Michel, L. [2 ]
Thompson, J. D. [3 ]
Poch, O. [3 ]
机构
[1] IGBMC, F-67404 Illkirch Graffenstaden, France
[2] Observ Astron, Equipe Hautes Energies, Strasbourg, France
[3] ICube UMR7357, Fac Med, F-67085 Strasbourg, France
关键词
SYSTEM; BIOINFORMATICS; MANAGEMENT; DATABASES; FRAMEWORK; RESOURCE;
D O I
10.1147/JRD.2014.2309032
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The requirements for scalable data integration systems for modern biology are indisputable, due to the very large, heterogeneous, and complex datasets available in public databases. The management and fusion of this "big data" with local databases represents a major challenge, since it underlies the computational inferences and models that will be subsequently generated and validated experimentally. In this paper, we present an alternative conception for local data integration, called BIRD (Biological Integration and Retrieval Data), based on four concepts: (i) a hybrid flat file and relational database architecture permits the rapid management of large volumes of heterogeneous datasets; (ii) a generic data model allows the simultaneous organization and classification of local databases according to real-world requirements; (iii) configuration rules are used to divide and map each data resource into several data model entities; and (iv) a simple, declarative query language (BIRD-QL) facilitates information extraction from heterogeneous datasets. This flexible, generic design allows the integration of diverse data formats in a searchable database with high-level functionalities depending on the specific scientific context. It has been validated in the context of real world projects, notably the SM2PH (Structural Mutation to the Phenotypes of Human Pathologies) project.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A Declarative Query Language Enabled Autonomous Deep Web Search Engine
    Naha, Kallol
    Jamil, Hasan M.
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 305 - 312
  • [2] Query performance evaluation of an architecture for fine-grained integration of heterogeneous grid data sources
    Zamboulis, Lucas
    Martin, Nigel
    Poulovassilis, Alexandra
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2010, 26 (08): : 1073 - 1091
  • [3] HETEROGENEOUS SENSOR DATA EXPLORATION AND SUSTAINABLE DECLARATIVE MONITORING ARCHITECTURE: APPLICATION TO SMART BUILDING
    Servigne, Sylvie
    Gripay, Yann
    Pinarer, Ozgun
    Samuel, John
    Ozgovde, Atay
    Jay, Jacques
    FIRST INTERNATIONAL CONFERENCE ON SMART DATA AND SMART CITIES, 30TH UDMS, 2016, 4-4 (W1): : 97 - 104
  • [4] XML-based approaches for the integration of heterogeneous bio-molecular data
    Mesiti, Marco
    Jimenez-Ruiz, Ernesto
    Sanz, Ismael
    Berlanga-Llavori, Rafael
    Perlasca, Paolo
    Valentini, Giorgio
    Manset, David
    BMC BIOINFORMATICS, 2009, 10 : S7
  • [5] A Visual Query Language for Spatial Data Warehouses
    Bimonte, Sandro
    Del Fatto, Vincenzo
    Paolino, Luca
    Sebillo, Monica
    Vitiello, Giuliana
    GEOSPATIAL THINKING, 2010, : 43 - +
  • [6] Enabling Query Processing across Heterogeneous Data Models: A Survey
    Tan, Ran
    Chirkova, Rada
    Gadepally, Vijay
    Mattson, Timothy G.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3211 - 3220
  • [7] RDF 1.1: Knowledge Representation and Data Integration Language for the Web
    Tomaszuk, Dominik
    Hyland-Wood, David
    SYMMETRY-BASEL, 2020, 12 (01):
  • [8] Data integration in biological research: an overview
    Lapatas, Vasileios
    Stefanidakis, Michalis
    Jimenez, Rafael C.
    Via, Allegra
    Schneider, Maria Victoria
    JOURNAL OF BIOLOGICAL RESEARCH-THESSALONIKI, 2015, 22 : 1 - 16
  • [9] Integration Approaches for Heterogeneous Big Data: A Survey
    Alma'aitah, Wafa' Za'al
    Quraan, Addy
    AL-Aswadi, Fatima N.
    Alkhawaldeh, Rami S.
    Alazab, Moutaz
    Awajan, Albara
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2024, 24 (01) : 3 - 20
  • [10] Teaching the Fundamentals of Biological Data Integration Using Classroom Games
    Schneider, Maria Victoria
    Jimenez, Rafael C.
    PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (12)