A Data Feature Extraction Method Based on the NOTEARS Causal Inference Algorithm

被引:2
作者
Wang, Hairui [1 ]
Li, Junming [1 ]
Zhu, Guifu [2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650504, Peoples R China
[2] Kunming Univ Sci & Technol, Informat Technol Construct Management Ctr, Kunming 650504, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 14期
基金
中国国家自然科学基金;
关键词
causal inference; relevance; feature extraction; compare; FEATURE-SELECTION; REGRESSION; CLASSIFICATION;
D O I
10.3390/app13148438
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Extracting effective features from high-dimensional datasets is crucial for determining the accuracy of regression and classification models. Model predictions based on causality are known for their robustness. Thus, this paper introduces causality into feature selection and utilizes Feature Selection based on NOTEARS causal discovery (FSNT) for effective feature extraction. This method transforms the structural learning algorithm into a numerical optimization problem, enabling the rapid identification of the globally optimal causality diagram between features and the target variable. To assess the effectiveness of the FSNT algorithm, this paper evaluates its performance by employing 10 regression algorithms and 8 classification algorithms for regression and classification predictions on six real datasets from diverse fields. These results are then compared with three mainstream feature selection algorithms. The results indicate a significant average decline of 54.02% in regression prediction achieved by the FSNT algorithm. Furthermore, the algorithm exhibits exceptional performance in classification prediction, leading to an enhancement in the precision value. These findings highlight the effectiveness of FSNT in eliminating redundant features and significantly improving the accuracy of model predictions.
引用
收藏
页数:22
相关论文
共 49 条
  • [1] Aliferis CF, 2010, J MACH LEARN RES, V11, P235
  • [2] Arcinas M.M., 2021, TURK J PHYSIOTHER R, V32, P6519
  • [3] Barber D., 2012, BAYESIAN REASONING M, V1st
  • [4] Batool S., 2021, P MOHAMMAD ALI JINNA, P1, DOI 10.1109/MAJICC53071.2021.9526239
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [7] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [8] Combining instance-based learning and logistic regression for multilabel classification
    Cheng, Weiwei
    Huellermeier, Eyke
    [J]. MACHINE LEARNING, 2009, 76 (2-3) : 211 - 225
  • [9] Chickering D. M., 2003, Journal of Machine Learning Research, V3, P507, DOI 10.1162/153244303321897717
  • [10] Chickering D.M., 2002, P 19 C UNCERTAINTY A, DOI [10.1023/B:JODS.0000045365.56394.b4, DOI 10.1023/B:JODS.0000045365.56394.B4]