Optimal Ratio of Continuous to Categorical Variables for the Two Group Location Model

被引:0
|
作者
Baah, Philemon [1 ]
Adebanji, Atinuke [1 ]
Kakaie, Romain Glele [2 ]
机构
[1] Kwame Nkrumah Univ Sci & Technol, Dept Math, Kumasi, Ghana
[2] Univ Abomey Calavi, Fac Agron Sci, Cotonou, Benin
关键词
Location model; classification; categorical to continuous variables; contingency table; leave-one-out method;
D O I
暂无
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We investigated the effect of different combinations of (p) continuous to (q) categorical variables and increasing group centroid separation function (delta = 1, 2, 3) on the performance of the Location model for two groups (Pi(i), i = 1, 2). The number of predictor variables were 4 and 8 with 1:3, 1:1 and 3:1 being the predetermined ratios for p : q. We generated N(mu(1), I) of sizes 40, 80 and 120 with MatLab R2007b for p variables within 2(q) binary cells in Pi(1). The size of Pi(2) was determined using sample ratios 1:1, 1:2, 1:3 and 1:4 for n(1) : n(2) within 2(q) cells. Group1 has mean mu((1))(1) = 0 in the first cell (for p continuous variables) and mu((1))(2) 2 = delta, subsequent cells, mu((m+1))(i) = mu((m))(i) + 1. Error rates reduced more rapidly for increase in d than asymptotically. The optimal p : q was 3: 1 and the model deteriorated at 1: 3 with larger variability. The 8 variable model performed better than the 4 variable model for large sample sizes of p : q = 1 : 1 and outperformed it for all sample sizes of p : q = 3 : 1. Results showed that to use the Location model for classification problems with equal (or more) categorical to continuous variables, it should be compensated with increased distance function and sample sizes.
引用
收藏
页码:18 / 26
页数:9
相关论文
共 50 条