Heterogeneous biological data integration with declarative query language

被引:4
作者
Nguyen, H. [1 ]
Michel, L. [2 ]
Thompson, J. D. [3 ]
Poch, O. [3 ]
机构
[1] IGBMC, F-67404 Illkirch Graffenstaden, France
[2] Observ Astron, Equipe Hautes Energies, Strasbourg, France
[3] ICube UMR7357, Fac Med, F-67085 Strasbourg, France
关键词
SYSTEM; BIOINFORMATICS; MANAGEMENT; DATABASES; FRAMEWORK; RESOURCE;
D O I
10.1147/JRD.2014.2309032
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The requirements for scalable data integration systems for modern biology are indisputable, due to the very large, heterogeneous, and complex datasets available in public databases. The management and fusion of this "big data" with local databases represents a major challenge, since it underlies the computational inferences and models that will be subsequently generated and validated experimentally. In this paper, we present an alternative conception for local data integration, called BIRD (Biological Integration and Retrieval Data), based on four concepts: (i) a hybrid flat file and relational database architecture permits the rapid management of large volumes of heterogeneous datasets; (ii) a generic data model allows the simultaneous organization and classification of local databases according to real-world requirements; (iii) configuration rules are used to divide and map each data resource into several data model entities; and (iv) a simple, declarative query language (BIRD-QL) facilitates information extraction from heterogeneous datasets. This flexible, generic design allows the integration of diverse data formats in a searchable database with high-level functionalities depending on the specific scientific context. It has been validated in the context of real world projects, notably the SM2PH (Structural Mutation to the Phenotypes of Human Pathologies) project.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Data Agnostic RoBERTa-based Natural Language to SQL Query Generation
    Pal, Debaditya
    Sharma, Harsh
    Chaudhuri, Kaustubh
    2021 6TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2021,
  • [22] Methodology of Big Data Integration from A Priori Unknown Heterogeneous Data Sources
    Samoylov, Alexey
    Sergeev, Nikolay
    Kucherova, Margarita
    Denisov, Boris
    PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 131 - 135
  • [23] PQL: Protein Query Language
    Elfayoumy, Sherif
    Bathen, Paul
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 127 - 132
  • [24] Challenges in Integration of Heterogeneous Internet of Things
    Noaman, Muhammad
    Khan, Muhammad Sohail
    Abrar, Muhammad Faisal
    Ali, Sikandar
    Alvi, Atif
    Saleem, Muhammad Asif
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [25] Schema mapping and query reformulation in peer-to-peer XML data integration system
    Pankowski, Tadeusz
    CONTROL AND CYBERNETICS, 2009, 38 (01): : 173 - 192
  • [26] Intonative Variety of Simple Declarative Sentences in the English Language
    Mahammad, Allahverdiyeva Feride
    INTERNATIONAL JOURNAL OF ENGLISH LINGUISTICS, 2015, 5 (05) : 164 - 170
  • [27] Contelog: A declarative language for modeling and reasoning with contextual knowledge
    Alsaig, Ammar
    Alagar, Vangalur
    Nematollaah, Shiri
    KNOWLEDGE-BASED SYSTEMS, 2020, 207 (207)
  • [28] A strictly declarative language for multi-agent modelling
    Wallis, S
    Edmonds, B
    Moss, S
    Gaylard, H
    COMPUTATION IN ECONOMICS, FINANCE AND ENGINEERING: ECONOMIC SYSTEMS, 2000, : 165 - 170
  • [29] Analysis of the optimization of SQL statements of the Structured Query Language using large volumes of data
    Vicuna Pino, Ariosto Eugenio
    Ponce Ordonez, Jessica Alexandra
    Erazo Moreta, Orlando Ramiro
    REVISTA PUBLICANDO, 2018, 5 (16): : 70 - 79
  • [30] GenoMetric Query Language: a novel approach to large-scale genomic data management
    Masseroli, Marco
    Pinoli, Pietro
    Venco, Francesco
    Kaitoua, Abdulrahman
    Jalili, Vahid
    Palluzzi, Fernando
    Muller, Heiko
    Ceri, Stefano
    BIOINFORMATICS, 2015, 31 (12) : 1881 - 1888