Some results on the maximal correlation in 2 x k contingency tables

被引:6
作者
Gautam, S [1 ]
Kimeldorf, G
机构
[1] Vanderbilt Univ, Sch Med, Dept Prevent Med, Div Biostat,Med Ctr N A1124, Nashville, TN 37232 USA
[2] Univ Texas, Richardson, TX 75083 USA
关键词
dual scaling; nominal categorical data; optimal scaling;
D O I
10.2307/2686053
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
For 2 x k contingency tables, we consider the statistic r*, the maximal correlation between the row and column variables, where the maximum is taken over all possible sets of scores (or "scales" or "weights") assigned to the k categories. For general m x k contingency tables, methods involving the maximization over sets of scores assigned to the categories (called dual-scaling methods) have been criticized for lack of statistical interpretation and for difficulty of computation. For the case m = 2, however, where nominal categorical data on two populations are compared, this article shows that r* has meaningful interpretations as a multiple correlation coefficient, as a numerical measure of association, and as an upper bound on correlation for reduced tables. These interpretations lead to a better understanding of the nature of the association between the two variables. These interpretations also yield insight into the role of the usual chi-square statistic for 2 x k tables. Furthermore, both r" and the set of scores at which this maximum is achieved are shown to have simple closed-form expressions. These scores are used to furnish a simple proof that the asymptotic distribution of nr*(2), based on a sample of size n, is a chi(2) distribution with k - 1 degrees of freedom.
引用
收藏
页码:336 / 341
页数:6
相关论文
共 15 条