Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction

被引:45
作者
He, Dan [1 ]
Kuhn, David [2 ]
Parida, Laxmi [1 ]
机构
[1] IBM TJ Watson Res, Yorktown Hts, NY 10598 USA
[2] USDA ARS, Subtrop Hort Res Stn, 13601 Old Cutler Rd, Miami, FL 33158 USA
关键词
MARKER-ASSISTED SELECTION; GENOMIC SELECTION;
D O I
10.1093/bioinformatics/btw249
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show thatmodelingmultiple traits together could improve the prediction accuracy for correlated traits.
引用
收藏
页码:37 / 43
页数:7
相关论文
共 32 条
  • [1] Abernethy J., 2006, ARXIV PREPRINT CS 06
  • [2] Abernethy J, 2009, J MACH LEARN RES, V10, P803
  • [3] Agarwal Agarwal Arvind. Arvind., 2010, NIPS. Ed. by, P46
  • [4] Argyriou A, 2007, Adv. Neural. Inf. Process. Syst, V20, P25
  • [5] Randomizing outputs to increase prediction accuracy
    Breiman, L
    [J]. MACHINE LEARNING, 2000, 40 (03) : 229 - 242
  • [6] Cai HY, 2014, LECT NOTES COMPUT SC, V8422, P31, DOI 10.1007/978-3-319-05813-9_3
  • [7] Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
    Chen, Jianhui
    Liu, Ji
    Ye, Jieping
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2012, 5 (04)
  • [8] Atomic decomposition by basis pursuit
    Chen, SSB
    Donoho, DL
    Saunders, MA
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) : 33 - 61
  • [9] Clark Samuel A, 2013, Methods Mol Biol, V1019, P321, DOI 10.1007/978-1-62703-447-0_13
  • [10] A Common Dataset for Genomic Analysis of Livestock Populations
    Cleveland, Matthew A.
    Hickey, John M.
    Forni, Selma
    [J]. G3-GENES GENOMES GENETICS, 2012, 2 (04): : 429 - 435