Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds

被引:34
作者
Bertolini, F. [1 ]
Galimberti, G. [2 ]
Schiavo, G. [1 ]
Mastrangelo, S. [3 ]
Di Gerlando, R. [3 ]
Strillacci, M. G. [4 ]
Bagnato, A. [4 ]
Portolano, B. [3 ]
Fontanesi, L. [1 ]
机构
[1] Univ Bologna, Dept Agr & Food Sci, Div Anim Sci, Viale Fanin 46, I-40127 Bologna, Italy
[2] Univ Bologna, Dept Stat Paolo Fortunati, Via Belle Arti 41, I-40126 Bologna, Italy
[3] Univ Palermo, Dept Agr & Forestry Sci, Viale Sci, I-90128 Palermo, Italy
[4] Univ Milan, Dept Vet Med, Via Celoria 10, I-20133 Milan, Italy
关键词
SNP; breed assignment; Random Forest; Bos taurus; WHOLE-GENOME ASSOCIATION; TRAITS; GENE; MARKERS; SNP; QTL; IDENTIFICATION;
D O I
10.1017/S1751731117001355
中图分类号
S8 [畜牧、 动物医学、狩猎、蚕、蜂];
学科分类号
0905 ;
摘要
Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, F-st and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and subjected to limited or no breeding programmes (Cinisara, Modicana, raised only in Sicily and Reggiana, raised only in Emilia Romagna). From these classifications, two panels of 96 and 48 SNPs that contain the most discriminant SNPs were created for each preselection method. These panels were evaluated in terms of the ability to discriminate as a whole and breed-by-breed, as well as linkage disequilibrium within each panel. The obtained results showed that for the 48-SNP panel, the error rate increased mainly for autochthonous breeds, probably as a consequence of their admixed origin lower selection pressure and by ascertaining bias in the construction of the SNP chip. The 96-SNP panels were generally more able to discriminate all breeds. The panel derived by PCA-chrom (obtained by a preselection chromosome by chromosome) could identify informative SNPs that were particularly useful for the assignment of minor breeds that reached the lowest value of Out Of Bag error even in the Cinisara, whose value was quite high in all other panels. Moreover, this panel contained also the lowest number of SNPs in linkage disequilibrium. Several selected SNPs are located nearby genes affecting breed-specific phenotypic traits (coat colour and stature) or associated with production traits. In general, our results demonstrated the usefulness of Random Forest in combination to other reduction techniques to identify population informative SNPs.
引用
收藏
页码:12 / 19
页数:8
相关论文
共 35 条
[1]   Association of a single, nucleotide polymorphism in SPP1 with growth traits and twinning in cattle population selected for twinning rate [J].
Allan, M. F. ;
Thallman, R. M. ;
Cushman, R. A. ;
Echternkamp, S. E. ;
White, S. N. ;
Kuehn, L. A. ;
Casas, E. ;
Smith, T. P. L. .
JOURNAL OF ANIMAL SCIENCE, 2007, 85 (02) :341-347
[2]   Domestic-animal genomics: deciphering the genetics of complex traits [J].
Andersson, L ;
Georges, M .
NATURE REVIEWS GENETICS, 2004, 5 (03) :202-212
[3]   Combined use of principal component analysis and random forests identify population-informative single nucleotide polymorphisms: application in cattle breeds [J].
Bertolini, F. ;
Galimberti, G. ;
Calo, D. G. ;
Schiavo, G. ;
Matassino, D. ;
Fontanesi, L. .
JOURNAL OF ANIMAL BREEDING AND GENETICS, 2015, 132 (05) :346-356
[4]   HIGH-RESOLUTION OF HUMAN EVOLUTIONARY TREES WITH POLYMORPHIC MICROSATELLITES [J].
BOWCOCK, AM ;
RUIZLINARES, A ;
TOMFOHRDE, J ;
MINCH, E ;
KIDD, JR ;
CAVALLISFORZA, LL .
NATURE, 1994, 368 (6470) :455-457
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering [J].
Browning, Sharon R. ;
Browning, Brian L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) :1084-1097
[7]   Identification of a missense mutation in the bovine ABCG2 gene with a major effect on the QTL on chromosome 6 affecting milk yield and composition in Holstein cattle [J].
Cohen-Zinder, M ;
Seroussi, E ;
Larkin, DM ;
Loor, JJ ;
Everts-van der Wind, A ;
Lee, JH ;
Drackley, JK ;
Band, MR ;
Hernandez, AG ;
Shani, M ;
Lewin, HA ;
Weller, JI ;
Ron, M .
GENOME RESEARCH, 2005, 15 (07) :936-944
[8]   Genome-wide association analysis of thirty one production, health, reproduction and body conformation traits in contemporary US Holstein cows [J].
Cole, John B. ;
Wiggans, George R. ;
Ma, Li ;
Sonstegard, Tad S. ;
Lawlor, Thomas J., Jr. ;
Crooker, Brian A. ;
Van Tassell, Curtis P. ;
Yang, Jing ;
Wang, Shengwen ;
Matukumalli, Lakshmi K. ;
Da, Yang .
BMC GENOMICS, 2011, 12
[9]   Worldwide Patterns of Ancestry, Divergence, and Admixture in Domesticated Cattle [J].
Decker, Jared E. ;
McKay, Stephanie D. ;
Rolf, Megan M. ;
Kim, JaeWoo ;
Molina Alcala, Antonio ;
Sonstegard, Tad S. ;
Hanotte, Olivier ;
Gotherstrom, Anders ;
Seabury, Christopher M. ;
Praharani, Lisa ;
Babar, Masroor Ellahi ;
de Almeida Regitano, Luciana Correia ;
Yildiz, Mehmet Ali ;
Heaton, Michael P. ;
Liu, Wan-Sheng ;
Lei, Chu-Zhao ;
Reecy, James M. ;
Saif-Ur-Rehman, Muhammad ;
Schnabel, Robert D. ;
Taylor, Jeremy F. .
PLOS GENETICS, 2014, 10 (03)
[10]   Association of 20 candidate gene markers with milk production and composition traits in sires of Reggiana breed, a local dairy cattle population [J].
Fontanesi, L. ;
Scotti, E. ;
Samore, A. B. ;
Bagnato, A. ;
Russo, V. .
LIVESTOCK SCIENCE, 2015, 176 :14-21