A hybrid and exploratory approach to knowledge discovery in metabolomic data

被引:9
|
作者
Grissa, Dhouha [1 ,4 ]
Comte, Blandine [1 ]
Petera, Melanie [2 ]
Pujos-Guillot, Estelle [1 ]
Napoli, Amedeo [3 ]
机构
[1] Univ Clermont Auvergne, INRA, UNH, Mapping, F-63000 Clermont Ferrand, France
[2] Univ Clermont Auvergne, INRA, UNH, Plateforme Explorat Metab,MetaboHUB Clermont, F-63000 Clermont Ferrand, France
[3] Univ Lorraine, CNRS, INRIA, LORIA, F-54000 Nancy, France
[4] Univ Copenhagen, Novo Nordisk Fdn, Ctr Prot Res, Blegdamsvej 3B, DK-2200 Copenhagen, Denmark
关键词
Hybrid knowledge discovery; Pattern mining; Formal concept analysis; Data and pattern exploration; Metabolomic data; Classification; Visualization; Interpretation; FORMAL CONCEPT ANALYSIS; FEATURE-SELECTION;
D O I
10.1016/j.dam.2018.11.025
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, we propose a hybrid and exploratory knowledge discovery approach for analyzing metabolomic complex data based on a combination of supervised classifiers, pattern mining and Formal Concept Analysis (FCA). The approach is based on three main operations, preprocessing, classification, and postprocessing. Classifiers are applied to datasets of the form individuals x features and produce sets of ranked features which are further analyzed. Pattern mining and FCA are used to provide a complementary analysis and support for visualization. A practical application of this framework is presented in the context of metabolomic data, where two interrelated problems are considered, discrimination and prediction of class membership. The dataset is characterized by a small set of individuals and a large set of features, in which predictive biomarkers of clinical outcomes should be identified. The problems of combining numerical and symbolic data mining methods, as well as discrimination and prediction, are detailed and discussed. Moreover, it appears that visualization based on FCA can be used both for guiding knowledge discovery and for interpretation by domain analysts. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:103 / 116
页数:14
相关论文
共 50 条
  • [1] Exploratory knowledge discovery over Web of Data
    Alam, Mehwish
    Buzmakov, Aleksey
    Napoli, Amedeo
    DISCRETE APPLIED MATHEMATICS, 2018, 249 : 2 - 17
  • [2] Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data
    Grissa, Dhouha
    Petera, Melanie
    Brandolini, Marion
    Napoli, Amedeo
    Comte, Blandine
    Pujos-Guillot, Estelle
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2016, 3
  • [3] MeMo: a hybrid SQL/XML approach to metabolomic data management for functional genomics
    Irena Spasić
    Warwick B Dunn
    Giles Velarde
    Andy Tseng
    Helen Jenkins
    Nigel Hardy
    Stephen G Oliver
    Douglas B Kell
    BMC Bioinformatics, 7
  • [4] Knowledge discovery in data sets with graded attributes
    Glodeanu, Cynthia Vera
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2016, 45 (02) : 232 - 249
  • [5] Prov-Dominoes: An approach for knowledge discovery from provenance data
    Alencar, Victor
    Kohwalter, Troy
    Braganholo, Vanessa
    Da Silva Junior, Jose Ricardo
    Murta, Leonardo
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [6] Knowledge discovery in astronomical data
    Zhang, Yanxia
    Zheng, Hongwen
    Zhao, Yongheng
    ADVANCED SOFTWARE AND CONTROL FOR ASTRONOMY II, PTS 1 & 2, 2008, 7019
  • [7] Data mining and knowledge discovery
    Trybula, WJ
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 1997, 32 : 197 - 229
  • [8] An Improved Evolutionary Algorithm for Data Mining and Knowledge Discovery
    Al Duhayyim, Mesfer
    Marzouk, Radwa
    Al-Wesabi, Fahd N.
    Alrajhi, Maram
    Hamza, Manar Ahmed
    Zamani, Abu Sarwar
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (01): : 1233 - 1247
  • [9] A systematic map of medical data preprocessing in knowledge discovery
    Idri, A.
    Benhar, H.
    Fernandez-Aleman, J. L.
    Kadi, I.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2018, 162 : 69 - 85
  • [10] A hybrid CI-based knowledge discovery system on microarray gene expression data
    Tang, YC
    He, YC
    Zhang, YQ
    Huang, Z
    Hu, XH
    Sunderraman, R
    PROCEEDINGS OF THE 2005 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2005, : 25 - 30