Heterogeneous biological data integration with declarative query language

被引:4
作者
Nguyen, H. [1 ]
Michel, L. [2 ]
Thompson, J. D. [3 ]
Poch, O. [3 ]
机构
[1] IGBMC, F-67404 Illkirch Graffenstaden, France
[2] Observ Astron, Equipe Hautes Energies, Strasbourg, France
[3] ICube UMR7357, Fac Med, F-67085 Strasbourg, France
关键词
SYSTEM; BIOINFORMATICS; MANAGEMENT; DATABASES; FRAMEWORK; RESOURCE;
D O I
10.1147/JRD.2014.2309032
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The requirements for scalable data integration systems for modern biology are indisputable, due to the very large, heterogeneous, and complex datasets available in public databases. The management and fusion of this "big data" with local databases represents a major challenge, since it underlies the computational inferences and models that will be subsequently generated and validated experimentally. In this paper, we present an alternative conception for local data integration, called BIRD (Biological Integration and Retrieval Data), based on four concepts: (i) a hybrid flat file and relational database architecture permits the rapid management of large volumes of heterogeneous datasets; (ii) a generic data model allows the simultaneous organization and classification of local databases according to real-world requirements; (iii) configuration rules are used to divide and map each data resource into several data model entities; and (iv) a simple, declarative query language (BIRD-QL) facilitates information extraction from heterogeneous datasets. This flexible, generic design allows the integration of diverse data formats in a searchable database with high-level functionalities depending on the specific scientific context. It has been validated in the context of real world projects, notably the SM2PH (Structural Mutation to the Phenotypes of Human Pathologies) project.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Multi-source heterogeneous data integration for incident likelihood analysis
    Kamil, Mohammad Zaid
    Khan, Faisal
    Amyotte, Paul
    Ahmed, Salim
    COMPUTERS & CHEMICAL ENGINEERING, 2024, 185
  • [32] Integration of heterogeneous scientific data using workflows - A case study in bioinformatics
    Vouk, MA
    ITI 2003: PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2003, : 25 - 28
  • [33] Data integration for plant genomics-exemplars from the integration of Arabidopsis thaliana databases
    Lysenko, Atem
    Hindle, Matthew Morritt
    Taubert, Jan
    Saqi, Mansoor
    Rawlings, Christopher John
    BRIEFINGS IN BIOINFORMATICS, 2009, 10 (06) : 676 - 693
  • [34] Ontology based information integration of Heterogeneous resources
    Li, Xia
    Wu, Bei
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND NATURAL COMPUTING, VOL II, 2009, : 288 - +
  • [35] Aggregating Heterogeneous Health Data Through an Ontological Common Health Language
    Kiourtis, Athanasios
    Mavrogiorgou, Argyro
    Kyriazis, Dimosthenis
    2017 10TH INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2017), 2017, : 175 - 181
  • [36] XOQL: OBJECT QUERY MARKUP LANGUAGE
    Oleynik, Pavel P.
    BIZNES INFORMATIKA-BUSINESS INFORMATICS, 2015, 32 (02): : 30 - 38
  • [37] A Comprehensive Query Language for Provenance Information
    Abu Jabal, Amani
    Bertino, Elisa
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2018, 27 (03)
  • [38] A Comparative Analysis of Biological Data Integration Systems Famous for Data Exploitation and Knowledge Discovery
    Irshad, Omer
    Khan, Muhammad Usman Ghani
    CURRENT BIOINFORMATICS, 2021, 16 (05) : 662 - 681
  • [39] A scoping review of semantic integration of health data and information
    Zhang, Hansi
    Lyu, Tianchen
    Yin, Pengfei
    Bost, Sarah
    He, Xing
    Guo, Yi
    Prosperi, Mattia
    Hogan, Willian R.
    Bian, Jiang
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2022, 165
  • [40] Effect of semantic distance on learning structured query language: An empirical study
    Shin, Shin-Shing
    FRONTIERS IN PSYCHOLOGY, 2022, 13