Metric learning on expression data for gene function prediction

被引:13
|
作者
Makrodimitris, Stavros [1 ,2 ]
Reinders, Marcel J. T. [1 ,3 ]
van Ham, Roeland C. H. J. [1 ,2 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, NL-2628 XE Delft, Netherlands
[2] Keygene NV, NL-6708 PW Wageningen, Netherlands
[3] Leiden Univ, Leiden Computat Biol Ctr, Med Ctr, NL-2333 ZC Leiden, Netherlands
关键词
REGRESSION; ALGORITHM; SELECTION; ENSEMBLE;
D O I
10.1093/bioinformatics/btz731
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Co-expression of two genes across different conditions is indicative of their involvement in the same biological process. However, when using RNA-Seq datasets with many experimental conditions from diverse sources, only a subset of the experimental conditions is expected to be relevant for finding genes related to a particular Gene Ontology (GO) term. Therefore, we hypothesize that when the purpose is to find similarly functioning genes, the co-expression of genes should not be determined on all samples but only on those samples informative for the GO term of interest. Results: To address this, we developed Metric Learning for Co-expression (MLC), a fast algorithm that assigns a GO-term-specific weight to each expression sample. The goal is to obtain a weighted co-expression measure that is more suitable than the unweighted Pearson correlation for applying Guilt-By-Association-based function predictions. More specifically, if two genes are annotated with a given GO term, MLC tries to maximize their weighted co-expression and, in addition, if one of them is not annotated with that term, the weighted co-expression is minimized. Our experiments on publicly available Arabidopsis thaliana RNA-Seq data demonstrate that MLC outperforms standard Pearson correlation in term-centric performance. Moreover, our method is particularly good at more specific terms, which are the most interesting. Finally, by observing the sample weights for a particular GO term, one can identify which experiments are important for learning that term and potentially identify novel conditions that are relevant, as demonstrated by experiments in both A. thaliana and Pseudomonas Aeruginosa.
引用
收藏
页码:1182 / 1190
页数:9
相关论文
共 50 条
  • [41] Sequence Prediction with Unlabeled Data by Reward Function Learning
    Wu, Lijun
    Zhao, Li
    Qin, Tao
    Lai, Jianhuang
    Liu, Tie-Yan
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3098 - 3104
  • [42] Prediction of Tumor Outcome Based on Gene Expression Data
    Liu Juan 1
    2. State Key Laboratory of Software Engineering
    3. Department of Frontier Informatics
    Wuhan University Journal of Natural Sciences, 2004, (02) : 177 - 182
  • [43] Prediction of chromosomal aneuploidy from gene expression data
    Hertzberg, Libi
    Betts, David R.
    Raimondi, Susana C.
    Schaefer, Beat W.
    Notterman, Daniel A.
    Domany, Eytan
    Iraeli, Shai
    GENES CHROMOSOMES & CANCER, 2007, 46 (01): : 75 - 86
  • [44] Machine Learning Framework for the Prediction of Alzheimer's Disease Using Gene Expression Data Based on Efficient Gene Selection
    El-Gawady, Aliaa
    Makhlouf, Mohamed A.
    Tawfik, BenBella S.
    Nassar, Hamed
    SYMMETRY-BASEL, 2022, 14 (03):
  • [45] Integration of heterogeneous data sources for gene function prediction using decision templates and ensembles of learning machines
    Re, Matteo
    Valentini, Giorgio
    NEUROCOMPUTING, 2010, 73 (7-9) : 1533 - 1537
  • [46] Metric Learning for Ordinal Data
    Shi, Yuan
    Li, Wenzhe
    Sha, Fei
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2030 - 2036
  • [47] Active learning using rough fuzzy classifier for cancer prediction from microarray gene expression data
    Halder, Anindya
    Kumar, Ansuman
    JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 92
  • [48] Prediction of tumor location in prostate cancer tissue using a machine learning system on gene expression data
    Osama Hamzeh
    Abedalrhman Alkhateeb
    Julia Zheng
    Srinath Kandalam
    Luis Rueda
    BMC Bioinformatics, 21
  • [49] Prediction of tumor location in prostate cancer tissue using a machine learning system on gene expression data
    Hamzeh, Osama
    Alkhateeb, Abedalrhman
    Zheng, Julia
    Kandalam, Srinath
    Rueda, Luis
    BMC BIOINFORMATICS, 2020, 21 (Suppl 2)
  • [50] Deep Learning-Based Prediction of Alzheimer's Disease Using Microarray Gene Expression Data
    Abdelwahab, Mahmoud M.
    Al-Karawi, Khamis A.
    Semary, Hatem E.
    Gulyaeva, Natalia V.
    BIOMEDICINES, 2023, 11 (12)