PAKDD Data Mining Competition 2009: New Ways of Using Known Methods

被引:0
作者
Linhart, Chaim [1 ]
Harari, Guy [1 ]
Abramovich, Sharon [2 ]
Buchris, Altina [2 ]
机构
[1] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
[2] Tel Aviv Univ, Dept Stat & Operat Res, IL-69978 Tel Aviv, Israel
来源
NEW FRONTIERS IN APPLIED DATA MINING | 2010年 / 5669卷
关键词
data mining; logistic regression; KNN; credit risk assessment;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The PAKDD 2009 competition focuses on the problem of credit risk assessment. As required, we had to confront the problem of the robustness of the credit-scoring model against performance degradation caused by gradual market changes along a few years of business operation. We utilized the following standard models: logistic regression, KNN, SVM, GBM and decision tree. The novelty of our approach is two-fold: the integration of existing models, namely feeding the results of KNN as an input variable to the logistic regression, and re-coding categorical variables as numerical values that represent each category's statistical impact on the target label. The best solution we obtained reached PI place in the competition, with an AUC score of 0.655.
引用
收藏
页码:99 / +
页数:2
相关论文
共 3 条
  • [1] [Anonymous], 2009, Encyclopadia Britannica
  • [2] [Anonymous], R LANG ENV STAT COMP
  • [3] Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer
    Ritchie, MD
    Hahn, LW
    Roodi, N
    Bailey, LR
    Dupont, WD
    Parl, FF
    Moore, JH
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2001, 69 (01) : 138 - 147