Multi-platform gene-expression mining and marker gene analysis

被引:8
作者
Xu, Qian [2 ]
Xue, Hong [3 ,4 ]
Yang, Qiang [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Bioengn Programme, Kowloon, Hong Kong, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Biochem, Kowloon, Hong Kong, Peoples R China
[4] Hong Kong Univ Sci & Technol, Appl Genom Ctr, Kowloon, Hong Kong, Peoples R China
关键词
data mining; bioinformatics; gene-expression data analysis; multi-task learning; SELF-ORGANIZING MAP; MICROARRAY DATA; MOLECULAR CLASSIFICATION; CHROMOSOMAL INSTABILITY; LOGISTIC-REGRESSION; BREAST-CANCER; MODEL; VISUALIZATION; PREDICTORS; SELECTION;
D O I
10.1504/IJDMB.2011.043030
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene-expression data are now widely available and used for a wide range of clinical and diagnostic purposes. A key challenge is to select a few significant marker genes for biological studies. While it is feasible to find important genes from a single gene-expression data set, it is often more meaningful to compare the results from different but related data sets together, especially for multiple gene-expression data sets arising from different studies of a common organism or phenotype. In this paper, we present a novel framework to exploit the commonalities across different data sets by jointly learning from different data sets simultaneously through multi-task feature learning. By identifying a common subspace of genes, we can help biologists find important marker genes that span different evolutionary periods in the life cycle of cancer development. The genes thus found are more stable and more significant. Our experimental results demonstrate that more accurate models can be built using multiple data sets based on fewer labelled examples. To the best of our knowledge, we are among the first to introduce multi-task learning in the bioinformatics community to solve the lack of data problem.
引用
收藏
页码:485 / 503
页数:19
相关论文
共 53 条
  • [1] Allenby GM, 1999, J ECONOMETRICS, V89, P57
  • [2] Ando RK, 2005, J MACH LEARN RES, V6, P1817
  • [3] [Anonymous], NIPS 2005 WORKSH IND
  • [4] Argyriou A., 2006, Advances in Neural Information Processing Systems, P41, DOI DOI 10.1007/S10994-007-5040-8
  • [5] A hierarchical Bayes model of primary and secondary demand
    Arora, N
    Allenby, GM
    Ginter, JL
    [J]. MARKETING SCIENCE, 1998, 17 (01) : 29 - 44
  • [6] Clinical significance of promoter hypermethylation of RASSF1A, RARβ2, BRCA1 and HOXA5 in breast cancers of Indian patients
    Bagadi, Sarangadhara Appala Raju
    Prasad, Chandra Prakash
    Kaur, Jatinder
    Srivastava, Anurag
    Prashad, Rajinder
    Gupta, Siddartha Datta
    Ralhan, Ranju
    [J]. LIFE SCIENCES, 2008, 82 (25-26) : 1288 - 1292
  • [7] Task clustering and gating for Bayesian multitask learning
    Bakker, B
    Heskes, T
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (01) : 83 - 99
  • [8] A model of inductive bias learning
    Baxter, J
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 12 : 149 - 198
  • [9] Bi JB, 2008, LECT NOTES ARTIF INT, V5211, P117
  • [10] PCA disjoint models for multiclass cancer analysis using gene expression data
    Bicciato, S
    Luchini, A
    Di Bello, C
    [J]. BIOINFORMATICS, 2003, 19 (05) : 571 - 578