Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds

被引:30
作者
Bertolini, F. [1 ]
Galimberti, G. [2 ]
Schiavo, G. [1 ]
Mastrangelo, S. [3 ]
Di Gerlando, R. [3 ]
Strillacci, M. G. [4 ]
Bagnato, A. [4 ]
Portolano, B. [3 ]
Fontanesi, L. [1 ]
机构
[1] Univ Bologna, Dept Agr & Food Sci, Div Anim Sci, Viale Fanin 46, I-40127 Bologna, Italy
[2] Univ Bologna, Dept Stat Paolo Fortunati, Via Belle Arti 41, I-40126 Bologna, Italy
[3] Univ Palermo, Dept Agr & Forestry Sci, Viale Sci, I-90128 Palermo, Italy
[4] Univ Milan, Dept Vet Med, Via Celoria 10, I-20133 Milan, Italy
关键词
SNP; breed assignment; Random Forest; Bos taurus; WHOLE-GENOME ASSOCIATION; TRAITS; GENE; MARKERS; SNP; QTL; IDENTIFICATION;
D O I
10.1017/S1751731117001355
中图分类号
S8 [畜牧、 动物医学、狩猎、蚕、蜂];
学科分类号
0905 ;
摘要
Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, F-st and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and subjected to limited or no breeding programmes (Cinisara, Modicana, raised only in Sicily and Reggiana, raised only in Emilia Romagna). From these classifications, two panels of 96 and 48 SNPs that contain the most discriminant SNPs were created for each preselection method. These panels were evaluated in terms of the ability to discriminate as a whole and breed-by-breed, as well as linkage disequilibrium within each panel. The obtained results showed that for the 48-SNP panel, the error rate increased mainly for autochthonous breeds, probably as a consequence of their admixed origin lower selection pressure and by ascertaining bias in the construction of the SNP chip. The 96-SNP panels were generally more able to discriminate all breeds. The panel derived by PCA-chrom (obtained by a preselection chromosome by chromosome) could identify informative SNPs that were particularly useful for the assignment of minor breeds that reached the lowest value of Out Of Bag error even in the Cinisara, whose value was quite high in all other panels. Moreover, this panel contained also the lowest number of SNPs in linkage disequilibrium. Several selected SNPs are located nearby genes affecting breed-specific phenotypic traits (coat colour and stature) or associated with production traits. In general, our results demonstrated the usefulness of Random Forest in combination to other reduction techniques to identify population informative SNPs.
引用
收藏
页码:12 / 19
页数:8
相关论文
共 35 条
  • [1] Association of a single, nucleotide polymorphism in SPP1 with growth traits and twinning in cattle population selected for twinning rate
    Allan, M. F.
    Thallman, R. M.
    Cushman, R. A.
    Echternkamp, S. E.
    White, S. N.
    Kuehn, L. A.
    Casas, E.
    Smith, T. P. L.
    [J]. JOURNAL OF ANIMAL SCIENCE, 2007, 85 (02) : 341 - 347
  • [2] Domestic-animal genomics: deciphering the genetics of complex traits
    Andersson, L
    Georges, M
    [J]. NATURE REVIEWS GENETICS, 2004, 5 (03) : 202 - 212
  • [3] Combined use of principal component analysis and random forests identify population-informative single nucleotide polymorphisms: application in cattle breeds
    Bertolini, F.
    Galimberti, G.
    Calo, D. G.
    Schiavo, G.
    Matassino, D.
    Fontanesi, L.
    [J]. JOURNAL OF ANIMAL BREEDING AND GENETICS, 2015, 132 (05) : 346 - 356
  • [4] HIGH-RESOLUTION OF HUMAN EVOLUTIONARY TREES WITH POLYMORPHIC MICROSATELLITES
    BOWCOCK, AM
    RUIZLINARES, A
    TOMFOHRDE, J
    MINCH, E
    KIDD, JR
    CAVALLISFORZA, LL
    [J]. NATURE, 1994, 368 (6470) : 455 - 457
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering
    Browning, Sharon R.
    Browning, Brian L.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) : 1084 - 1097
  • [7] Identification of a missense mutation in the bovine ABCG2 gene with a major effect on the QTL on chromosome 6 affecting milk yield and composition in Holstein cattle
    Cohen-Zinder, M
    Seroussi, E
    Larkin, DM
    Loor, JJ
    Everts-van der Wind, A
    Lee, JH
    Drackley, JK
    Band, MR
    Hernandez, AG
    Shani, M
    Lewin, HA
    Weller, JI
    Ron, M
    [J]. GENOME RESEARCH, 2005, 15 (07) : 936 - 944
  • [8] Genome-wide association analysis of thirty one production, health, reproduction and body conformation traits in contemporary US Holstein cows
    Cole, John B.
    Wiggans, George R.
    Ma, Li
    Sonstegard, Tad S.
    Lawlor, Thomas J., Jr.
    Crooker, Brian A.
    Van Tassell, Curtis P.
    Yang, Jing
    Wang, Shengwen
    Matukumalli, Lakshmi K.
    Da, Yang
    [J]. BMC GENOMICS, 2011, 12
  • [9] Worldwide Patterns of Ancestry, Divergence, and Admixture in Domesticated Cattle
    Decker, Jared E.
    McKay, Stephanie D.
    Rolf, Megan M.
    Kim, JaeWoo
    Molina Alcala, Antonio
    Sonstegard, Tad S.
    Hanotte, Olivier
    Gotherstrom, Anders
    Seabury, Christopher M.
    Praharani, Lisa
    Babar, Masroor Ellahi
    de Almeida Regitano, Luciana Correia
    Yildiz, Mehmet Ali
    Heaton, Michael P.
    Liu, Wan-Sheng
    Lei, Chu-Zhao
    Reecy, James M.
    Saif-Ur-Rehman, Muhammad
    Schnabel, Robert D.
    Taylor, Jeremy F.
    [J]. PLOS GENETICS, 2014, 10 (03):
  • [10] Association of 20 candidate gene markers with milk production and composition traits in sires of Reggiana breed, a local dairy cattle population
    Fontanesi, L.
    Scotti, E.
    Samore, A. B.
    Bagnato, A.
    Russo, V.
    [J]. LIVESTOCK SCIENCE, 2015, 176 : 14 - 21