MISCLASSIFICATION AMONG METHODS USED FOR MULTIPLE GROUP DISCRIMINATION - THE EFFECTS OF DISTRIBUTIONAL PROPERTIES

被引:20
作者
BARON, AE
机构
[1] Department of Preventive Medicine and Biometrics and National Center for American Indian and Alaska Native Mental Health Research, Department of Psychiatry, University of Colorado Health Sciences Center, Denver, Colorado, 80262
关键词
D O I
10.1002/sim.4780100511
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Methods of multiple group discriminant analysis have not been fully studied with respect to classification into more than two populations when the covariate distributions are normal or non-normal. The present study examines the classification performance of several multiple discrimination methods under a variety of simulated continuous normal and non-normal covariate distributions. The methods include polychotomous logistic regression, multiple group linear discriminant analysis, kernel density estimation, and rank transformations of the data as input into the linear function. The parameters of interest were distance among populations, configuration of population mean vectors (collinear or forming the vertices of a regular simplex), skewness, kurtosis and bimodality. Simulation of the last three parameters was by log-normal, sinh-1 normal and a two-component mixture of normal distributions, respectively. Results with three trivariate populations show that for all distributions, logistic discrimination classifies close to the optimal under Neyman-Pearson allocation. These results suggest that logistic discrimination is preferable to other widely-used methods for multiple group classification with non-normal data, and is comparable to classification by multiple linear discrimination with normal data.
引用
收藏
页码:757 / 766
页数:10
相关论文
共 55 条
  • [1] Fisher R.A., The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7, pp. 179-188, (1936)
  • [2] Truett J., Cornfield J., Kennel W., A multivariate analysis of the risk of coronary heart disease in Framingham, Journal of Chronic Diseases, 20, pp. 511-524, (1967)
  • [3] Scott D.W., Gotto A.M., Cole J.S., Gorry G.A., Plasma lipids as collateral risk factors in coronary artery disease: A study of 371 males with chest pain, Journal of Chronic Diseases, 31, pp. 337-345, (1978)
  • [4] Scott D.W., Gorry G.A., Hoffman R.G., Barboriak J.J., Gotto A.M., A new approach for evaluating risk factors in coronary artery disease: A study of lipid concentrations and severity of disease in 1847 males, Circulation, 62, pp. 477-484, (1980)
  • [5] Carpenter R.G., Gardner R.G., Gardner A., McWeenz P.M., Emery J.L., Multistage scoring system for identifying infants at risk of unexpected death, Archives of Diseases in Childhood, 52, pp. 606-612, (1977)
  • [6] Lachenbruch P.A., Kisker C.T., Stein M.N., Henriksen R.A., (1980)
  • [7] Afifi A.A., Sacks S.T., Liu V.Y., Weil M.H., Shubin H., Accumulative prognostic index for patients with barbiturate, glutethimide and meprobamate intoxication, New England Journal of Medicine, 285, pp. 1497-1502, (1971)
  • [8] Titterington D.M., Murray G.D., Murray L.S., Spiegelbatter D.J., Skene A.M., Habbema J.D.F., Gelpke G.J., Comparison of discrimination techniques applied to a complex data set of head injured patients, Journal of the Royal Statistical Society. Series A (General), 144, pp. 145-175, (1981)
  • [9] Albert A., Lesaffre E., Multiple group logistic discrimination, Computers and Mathematics with Applications, 12, pp. 209-224, (1986)
  • [10] Williams B.K., A simple demonstration of the relation between classification and canonical variates analysis, The American Statistician, 36, pp. 363-365, (1982)