A Novel Approach for Data Feature Weighting Using Correlation Coefficients and Min-Max Normalization

被引:18
作者
Shantal, Mohammed [1 ]
Othman, Zalinda [1 ]
Bakar, Azuraliza Abu [1 ]
Jha, Sunil
Rataj, Malgorzata
Zhang, Xiaorui
Wang, Jian-Qiang
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Ctr Artificial Intelligence Technol, Bangi 43600, Selangor, Malaysia
来源
SYMMETRY-BASEL | 2023年 / 15卷 / 12期
关键词
data normalization; data standardization; feature weighting; correlation matrix; correlation coefficient; classification method; regression method; FEATURE-SELECTION; CLASSIFICATION;
D O I
10.3390/sym15122185
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the realm of data analysis and machine learning, achieving an optimal balance of feature importance, known as feature weighting, plays a pivotal role, especially when considering the nuanced interplay between the symmetry of data distribution and the need to assign differential weights to individual features. Also, avoiding the dominance of large-scale traits is essential in data preparation. This step makes choosing an effective normalization approach one of the most challenging aspects of machine learning. In addition to normalization, feature weighting is another strategy to deal with the importance of the different features. One of the strategies to measure the dependency of features is the correlation coefficient. The correlation between features shows the relationship strength between the features. The integration of the normalization method with feature weighting in data transformation for classification has not been extensively studied. The goal is to improve the accuracy of classification methods by striking a balance between the normalization step and assigning greater importance to features with a strong relation to the class feature. To achieve this, we combine Min-Max normalization and weight the features by increasing their values based on their correlation coefficients with the class feature. This paper presents a proposed Correlation Coefficient with Min-Max Weighted (CCMMW) approach. The data being normalized depends on their correlation with the class feature. Logistic regression, support vector machine, k-nearest neighbor, neural network, and naive Bayesian classifiers were used to evaluate the proposed method. Twenty UCI Machine Learning Repository and Kaggle datasets with numerical values were also used in this study. The empirical results showed that the proposed CCMMW significantly improves the classification performance through support vector machine, logistic regression, and neural network classifiers in most datasets.
引用
收藏
页数:18
相关论文
共 54 条
  • [1] Adeyemo A., 2019, Journal of Information Systems Applied Research, V12, P37
  • [2] Ali N.A., 2017, Kurdistan Journal of Applied Research, V2, P66
  • [3] [Anonymous], 2018, International Journal on Advanced Science, Engineering and Information Technology
  • [4] Ensemble correlation-based low-rank matrix completion with applications to traffic data imputation
    Chen, Xiaobo
    Wei, Zhongjie
    Li, Zuoyong
    Liang, Jun
    Cai, Yingfeng
    Zhang, Bob
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 132 : 249 - 262
  • [5] Optimize TSK Fuzzy Systems for Classification Problems: Minibatch Gradient Descent With Uniform Regularization and Batch Normalization
    Cui, Yuqi
    Wu, Dongrui
    Huang, Jian
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (12) : 3065 - 3075
  • [6] Dalatu PI, 2020, MALAYS J MATH SCI, V14, P41
  • [7] Dynamic feature weighting for multi-label classification problems
    Dialameh, Maryam
    Hamzeh, Ali
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, 2021, 10 (03) : 283 - 295
  • [8] A general feature-weighting function for classification problems
    Dialameh, Maryam
    Jahromi, Mansoor Zolghadri
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 72 : 177 - 188
  • [9] Faloutsos C, 2012, MOR KAUF D, P83
  • [10] Garcia S, 2015, INTEL SYST REF LIBR, V72, P1, DOI 10.1007/978-3-319-10247-4