Computational method for discovery of biomarker signatures from large, complex data sets

被引:2
作者
Makarov, Vladimir [1 ,2 ]
Gorlin, Alex [2 ]
机构
[1] Calif State Univ Channel Isl, Camarillo, CA 93012 USA
[2] IFXworks LLC, 2915 Columbia Pike, Arlingtion, VA 22204 USA
关键词
Biomarker; Microarray; Gene expression; Chemical; Classification; TRANSLATIONAL BIOINFORMATICS; SELECTION; CLASSIFICATION;
D O I
10.1016/j.compbiolchem.2018.07.008
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We present an efficient method for identifying of reliable biomarker panels from large multivariate data sets that typically result from experiments that monitor changes in RNA, small molecule, or protein abundance. Our computational methodology is developed and validated on the toxicogenomics database Drug Matrix that in its largest category contains 1656 recognition targets, characterized by the toxicant, dose and time (or duration) of the exposure. We were able to recognize both individual experimental conditions (compound, dose and time combinations) and the cases where the values for dose and time variables fall within the intervals in the training data, but do not match the training data exactly. Inclusion of gene expression information for multiple organs improved accuracy of recognition. Inclusion of time response information into consideration allowed us to develop particularly accurate marker panels for a large number of targets: we were able to recognize 176 compounds (out of 316) at greater than 90% accuracy. The presented methodology has an immediate application for discovery of diagnostic biomarker panels for exposure to various toxicity hazards, and may also be useful for development of biological markers for medical applications.
引用
收藏
页码:161 / 168
页数:8
相关论文
共 47 条
  • [21] Heuristic method for attribute selection from partially uncertain data using rough sets
    Trabelsi, Salsabil
    Elouedi, Zied
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2010, 39 (03) : 271 - 290
  • [22] Statistically derived morphological signatures of large river channels extracted from topo-bathymetric LiDAR data
    Andreault, Alex
    Rodrigues, Stephane
    Gaudichet, Corentin
    Wintenberger, Coraline Lise
    EARTH SURFACE PROCESSES AND LANDFORMS, 2024, 49 (02) : 804 - 820
  • [23] Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data
    Mudunuri, Uma S.
    Khouja, Mohamad
    Repetski, Stephen
    Venkataraman, Girish
    Che, Anney
    Luke, Brian T.
    Girard, F. Pascal
    Stephens, Robert M.
    PLOS ONE, 2013, 8 (12):
  • [24] Creation of Libraries of Recurring Mass Spectra from Large Data Sets Assisted by a Dual-Column Workflow
    Mallard, W. Gary
    Andriamaharavo, N. Rabe
    Mirokhin, Yuri A.
    Halket, John M.
    Stein, Stephen E.
    ANALYTICAL CHEMISTRY, 2014, 86 (20) : 10231 - 10238
  • [25] A novel K-means hierarchical clustering algorithm for efficient information extraction from large data sets
    Shahapurkar, SS
    Sundareshan, MK
    IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 390 - 396
  • [26] CLaSPS: A NEW METHODOLOGY FOR KNOWLEDGE EXTRACTION FROM COMPLEX ASTRONOMICAL DATA SETS
    D'Abrusco, R.
    Fabbiano, G.
    Djorgovski, G.
    Donalek, C.
    Laurino, O.
    Longo, G.
    ASTROPHYSICAL JOURNAL, 2012, 755 (02)
  • [27] Optimized Phenotypic Biomarker Discovery and Confounder Elimination via Covariate-Adjusted Projection to Latent Structures from Metabolic Spectroscopy Data
    Posma, Joram M.
    Garcia-Perez, Isabel
    Ebbels, Timothy M. D.
    Lindon, John C.
    Stamler, Jeremiah
    Elliott, Paul
    Holmes, Elaine
    Nicholson, Jeremy K.
    JOURNAL OF PROTEOME RESEARCH, 2018, 17 (04) : 1586 - 1595
  • [28] A computational framework for complex disease stratification from multiple large-scale datasets
    De Meulder, Bertrand
    Lefaudeux, Diane
    Bansal, Aruna T.
    Mazein, Alexander
    Chaiboonchoe, Amphun
    Ahmed, Hassan
    Balaur, Irina
    Saqi, Mansoor
    Pellet, Johann
    Ballereau, Stephane
    Lemonnier, Nathanael
    Sun, Kai
    Pandis, Ioannis
    Yang, Xian
    Batuwitage, Manohara
    Kretsos, Kosmas
    van Eyll, Jonathan
    Bedding, Alun
    Davison, Timothy
    Dodson, Paul
    Larminie, Christopher
    Postle, Anthony
    Corfield, Julie
    Djukanovic, Ratko
    Chung, Kian Fan
    Adcock, Ian M.
    Guo, Yi-Ke
    Sterk, Peter J.
    Manta, Alexander
    Rowe, Anthony
    Baribaud, Frederic
    Auffray, Charles
    BMC SYSTEMS BIOLOGY, 2018, 12
  • [29] The ENIGMA-Epilepsy working group: Mapping disease from large data sets
    Sisodiya, Sanjay M.
    Whelan, Christopher D.
    Hatton, Sean N.
    Huynh, Khoa
    Altmann, Andre
    Ryten, Mina
    Vezzani, Annamaria
    Caligiuri, Maria Eugenia
    Labate, Angelo
    Gambardella, Antonio
    Ives-Deliperi, Victoria
    Meletti, Stefano
    Munsell, Brent C.
    Bonilha, Leonardo
    Tondelli, Manuela
    Rebsamen, Michael
    Rummel, Christian
    Vaudano, Anna Elisabetta
    Wiest, Roland
    Balachandra, Akshara R.
    Bargallo, Nuria
    Bartolini, Emanuele
    Bernasconi, Andrea
    Bernasconi, Neda
    Bernhardt, Boris
    Caldairou, Benoit
    Carr, Sarah J. A.
    Cavalleri, Gianpiero L.
    Cendes, Fernando
    Concha, Luis
    Desmond, Patricia M.
    Domin, Martin
    Duncan, John S.
    Focke, Niels K.
    Guerrini, Renzo
    Hamandi, Khalid
    Jackson, Graeme D.
    Jahanshad, Neda
    Kalviainen, Reetta
    Keller, Simon S.
    Kochunov, Peter
    Kowalczyk, Magdalena A.
    Kreilkamp, Barbara A. K.
    Kwan, Patrick
    Lariviere, Sara
    Lenge, Matteo
    Lopez, Seymour M.
    Martin, Pascal
    Mascalchi, Mario
    Moreira, Jose C. V.
    HUMAN BRAIN MAPPING, 2022, 43 (01) : 113 - 128
  • [30] MSINGB: A Novel Computational Method Based on NGBoost for Identifying Microsatellite Instability Status from Tumor Mutation Annotation Data
    Chen, Jinxiang
    Wang, Miao
    Zhao, Defeng
    Li, Fuyi
    Wu, Hao
    Liu, Quanzhong
    Li, Shuqin
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2023, 15 (01) : 100 - 110