Heterogeneous biological data integration with declarative query language

被引:4
作者
Nguyen, H. [1 ]
Michel, L. [2 ]
Thompson, J. D. [3 ]
Poch, O. [3 ]
机构
[1] IGBMC, F-67404 Illkirch Graffenstaden, France
[2] Observ Astron, Equipe Hautes Energies, Strasbourg, France
[3] ICube UMR7357, Fac Med, F-67085 Strasbourg, France
关键词
SYSTEM; BIOINFORMATICS; MANAGEMENT; DATABASES; FRAMEWORK; RESOURCE;
D O I
10.1147/JRD.2014.2309032
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The requirements for scalable data integration systems for modern biology are indisputable, due to the very large, heterogeneous, and complex datasets available in public databases. The management and fusion of this "big data" with local databases represents a major challenge, since it underlies the computational inferences and models that will be subsequently generated and validated experimentally. In this paper, we present an alternative conception for local data integration, called BIRD (Biological Integration and Retrieval Data), based on four concepts: (i) a hybrid flat file and relational database architecture permits the rapid management of large volumes of heterogeneous datasets; (ii) a generic data model allows the simultaneous organization and classification of local databases according to real-world requirements; (iii) configuration rules are used to divide and map each data resource into several data model entities; and (iv) a simple, declarative query language (BIRD-QL) facilitates information extraction from heterogeneous datasets. This flexible, generic design allows the integration of diverse data formats in a searchable database with high-level functionalities depending on the specific scientific context. It has been validated in the context of real world projects, notably the SM2PH (Structural Mutation to the Phenotypes of Human Pathologies) project.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] The Spoofax Language Workbench Rules for Declarative Specification of Languages and IDEs
    Kats, Lennart C. L.
    Visser, Eelco
    ACM SIGPLAN NOTICES, 2010, 45 (10) : 444 - 463
  • [42] Functional declarative language design and predicate calculus: A practical approach
    Boute, R
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2005, 27 (05): : 988 - 1047
  • [43] Heterogeneous data source integration for smart grid ecosystems based on metadata mining
    Guerrero, Juan I.
    Garcia, Antonio
    Personal, Enrique
    Luque, Joaquin
    Leon, Carlos
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 79 : 254 - 268
  • [44] A weighted Bayesian integration method for predicting drug combination using heterogeneous data
    Li, Tingting
    Xiao, Long
    Geng, Haigang
    Chen, Anqi
    Hu, Yue-Qing
    JOURNAL OF TRANSLATIONAL MEDICINE, 2024, 22 (01)
  • [45] Research on heterogeneous data integration model of group enterprise based on cluster computing
    Zhou, Qingyuan
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2016, 19 (03): : 1275 - 1282
  • [46] Enriching Nanomaterials Omics Data: An Integration Technique to Generate Biological Descriptors
    Tsiliki, Georgia
    Nymark, Penny
    Kohonen, Pekka
    Grafstrom, Roland
    Sarimveis, Haralambos
    SMALL METHODS, 2017, 1 (11):
  • [47] Prospective Data Model and Distributed Query Processing for Mobile Sensing Data Streams
    Brahem, Mariem
    Zeitouni, Karine
    Yeh, Laurent
    El Hafyani, Hafsa
    MULTIPLE-ASPECT ANALYSIS OF SEMANTIC TRAJECTORIES, 2020, 11889 : 66 - 82
  • [48] S3QL: A distributed domain specific language for controlled semantic integration of life sciences data
    Deus, Helena F.
    Correa, Miria C.
    Stanislaus, Romesh
    Miragaia, Maria
    Maass, Wolfgang
    de Lencastre, Hermnia
    Fox, Ronan
    Almeida, Jonas S.
    BMC BIOINFORMATICS, 2011, 12
  • [49] Minimizing the cross validation error to mix kernel matrices of heterogeneous biological data
    Tsuda, K
    Uda, S
    Kin, T
    Asai, K
    NEURAL PROCESSING LETTERS, 2004, 19 (01) : 63 - 72
  • [50] Minimizing the Cross Validation Error to Mix Kernel Matrices of Heterogeneous Biological Data
    Koji Tsuda
    Shinsuke Uda
    Taishin Kin
    Kiyoshi Asai
    Neural Processing Letters, 2004, 19 : 63 - 72