Applying a Mutual Information Theory Based Feature Selection Method to a Classifier

被引:0
|
作者
Lee, Sun-Mi [1 ]
机构
[1] Catholic Univ Korea, Coll Nursing, Seoul, South Korea
关键词
Feature Selection; Classifier; Naive Bayes; Data Mining; Prediction;
D O I
暂无
中图分类号
R-058 [];
学科分类号
摘要
Objective: The purpose of this study was to explore the usability of a feature selection method based on the mutual information theory to increase predictive performance of a classifier in data mining. Methods: The HIV Cost and Services Utilization Study(HCSUS) dataset was used to apply the feature selection method to a classifier. Its contribution to increasing the predictive performance of the classifier was evaluated by comparing the Naive Bayes(NB) and the Logistic Regression(LG) models using different variables. The infrequent office visit representing limited health service utilization was selected as an outcome variable. HUGIN Researcher (TM) 6.3 was used to train and test the NB models and SAS (R) 8.0 was used for the LG modeling. Results: Higher AUC in the NB model was obtained using the variables selected by the mutual information based feature selection method(AUC=.639, CI=.611, .660); lower AUC using the variables defined by a previous study(AUC=.599, CI=.570, .620). There was no difference between the LG models with different variables. Conclusion: This study demonstrated the mutual information method may be useful in identifying relevant predictors as the feature selection method, which can contribute to an increase in the predictive performance of a classifier.
引用
收藏
页码:247 / 253
页数:7
相关论文
共 13 条
  • [1] EQUITY OF ACCESS TO MEDICAL-CARE - A CONCEPTUAL AND EMPIRICAL OVERVIEW
    ADAY, LA
    ANDERSEN, RM
    [J]. MEDICAL CARE, 1981, 19 (12) : 4 - 27
  • [2] ADAY LA, 1997, HDB HLTH BEHAV RES, V1, P153
  • [3] ADAY LA, 1993, INTRO HLTH SERVICES, P46
  • [4] ANDERSEN RM, 1999, INTRO HLTH SERVICES, P87
  • [5] Burnside E, 2000, P AMIA ANN S 2000 NO
  • [6] Cover T.M., 1991, ELEM INF THEORY, V2, P12, DOI DOI 10.1002/0471200611.CH2
  • [7] Domingos P., 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P105
  • [8] GOODNESS OF FIT TESTS FOR THE MULTIPLE LOGISTIC REGRESSION-MODEL
    HOSMER, DW
    LEMESHOW, S
    [J]. COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1980, 9 (10): : 1043 - 1069
  • [9] Jensen F, 2003, HUGIN API REFERENCE
  • [10] Rowland T, 1998, P AMIA ANN S 1998 OC