Granularity refined by knowledge: contingency tables and rough sets as tools of discovery

被引:1
作者
Zytkow, JM [1 ]
机构
[1] Univ N Carolina, Dept Comp Sci, Charlotte, NC 28223 USA
来源
DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS, AND TECHNOLOGY II | 2000年 / 4057卷
关键词
knowledge discovery; knowledge refinement; automated discovery; granularity; indiscernibility; approximation; contingency tables; rough sets;
D O I
10.1117/12.381720
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contingency tables represent data in a granular way and are a well-established tool for inductive generalization of knowledge from data. We show that the basic concepts of rough sets, such as concept approximation! indiscernibility, and reduct can be expressed in the language of contingency tables. We further demonstrate the relevance to rough sets theory of additional probabilistic information available in contingency tables and in particular of statistical tests of significance and predictive strength applied to contingency tables. Tests of both type can help the evaluation mechanisms used in inductive generalization based on rough sets. Granularity of attributes can be improved in feedback with knowledge discovered in data. We demonstrate how 49er's facilities for (1) contingency table refinement, for (2) column and row grouping based on correspondence analysis, and (3) the search for equivalence relations between attributes improve both granularization of attributes and the quality of knowledge. Finally we demonstrate the limitations of knowledge viewed as concept approximation, which is the focus of rough sets. Transcending that focus and reorienting towards the predictive knowledge and towards the related distinction between possible and impossible (or statistically improbable) situations will be very useful in expanding the rough sets approach to more expressive forms of knowledge.
引用
收藏
页码:82 / 91
页数:10
相关论文
共 16 条
[1]  
[Anonymous], 1998, ROUGH SETS KNOWLEDGE
[2]  
[Anonymous], J INTELL INF SYST
[3]  
[Anonymous], 1998, ROUGH SETS KNOWLEDGE
[4]  
BHATTACHARYYA GK, 1986, STAT CONCEPTS METHOD
[5]  
Fienberg Stephen E., 1980, The Analysis of Cross-Classi?ed Categorical Data
[6]  
Fisher D. H., 1987, Machine Learning, V2, P139, DOI 10.1007/BF00114265
[7]  
GOKHALE DV, 1978, INFORMATION CONTINGE
[8]  
GOODMAN L, 1979, SPRINGER SERIES STAT
[9]  
Jobson J.D., 1991, Applied Multivariate Data Analysis
[10]  
Langley P., 1987, SCI DISCOVERY ACCOUN