The modified location model for classifying genetic resources: I. Association between categorical and continuous variables

被引:12
作者
Franco, J
Crossa, J
机构
[1] CIMMYT, Biometr & Stat Unit, Mexico City 06600, DF, Mexico
[2] Univ Republ Oriental Uruguay, Fac Agron, Montevideo, Uruguay
关键词
D O I
10.2135/cropsci2002.1719
中图分类号
S3 [农学(农艺学)];
学科分类号
0901 ;
摘要
When evaluating genetic resources, data on continuous and categorical attributes of accessions is collected. The Ward method and the modified location model (MLM) (Ward-MLM strategy) classify genetic resources using a mixture of categorical and continuous variables and combine all the categorical variables into one multinomial variable (11) that is assumed to be statistically independent from the continuous variables. The main objective of this study was to examine the robustness of the MLM for recovering underlying true subpopulations when independence between variable Wand the continuous variables does not hold. In addition, several scenarios, based on different degrees of overlap of the continuous and discrete variables in simulated data sets, were generated. Results showed that when the subpopulations were well-differentiated, the Ward-MLM strategy effectively predicted the true number of subpopulations and fully recovered their structure, regardless of the level of dependence between the W variable and the vector of continuous variables. When the subpopulations showed unclear boundaries and a high degree of overlap in the W variable and in the continuous variables, the Ward-MLM strategy predicted a different number of subpopulations but fully recovered the composition of the subpopulations. In this case, new groups are formed that show, a balanced and consistent structure in their composition as compared with the subpopulations. The MLM proved to be a robust model under medium and strong dependence between the variable Wand the vector of continuous variables and under various kinds of overlapping between subpopulations with respect to the continuous and discrete variables.
引用
收藏
页码:1719 / 1726
页数:8
相关论文
共 26 条
[1]  
[Anonymous], P C INF STAT IND SCI
[2]  
[Anonymous], 1987, CLUSTAN USER MANUAL
[3]  
[Anonymous], 1979, Multivariate analysis
[4]   CORE COLLECTIONS - A PRACTICAL APPROACH TO GENETIC-RESOURCES MANAGEMENT [J].
BROWN, AHD .
GENOME, 1989, 31 (02) :818-824
[5]  
CROSSA J, 1995, CORE COLLECTIONS PLA
[6]  
DAY NE, 1969, BIOMETRIKA, V56, P463, DOI 10.1093/biomet/56.3.463
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]  
Everitt B. S., 1981, FINITE MIXTURE DISTR
[9]   The modified location model for classifying genetic resources: II. Unrestricted variance-covariance matrices [J].
Franco, J ;
Crossa, J ;
Taba, S ;
Eberhart, SA .
CROP SCIENCE, 2002, 42 (05) :1727-1736
[10]   A two-stage, three-way method for classifying genetic resources in multiple environments [J].
Franco, J ;
Crossa, J ;
Villaseñor, J ;
Castillo, A ;
Taba, S ;
Eberhart, SA .
CROP SCIENCE, 1999, 39 (01) :259-267