Combining rough set and principal component analysis for preprocessing on commercial data stream

被引:0
作者
Huijian, Xu [1 ]
Feipeng, Guo [1 ]
机构
[1] Information Technology Department, Zhejiang Economic and Trade Polytechnic
关键词
Attribute reduction; Data stream preprocessing; Principal component analysis; Rough set;
D O I
10.4156/jcit.vol7.issue2.16
中图分类号
学科分类号
摘要
In data stream environment, the amount of data is large and the attributes is multidimensional which cause difficulty to data preprocessing. In order to solve these problems, a heuristic method for reduction of attributes (RSPCA) combining rough set theory (RS) with principal component analysis (PCA) is proposed. Firstly, PCA makes dimension reduction and removes noises for the data on sliding window of data stream. After that, decision-making information composed of principal component variables is got. Then, according to the characteristics of principal components that they are relative independence with each other, equivalence class is divided. On this basis, an attribute reduction algorithm of equivalence class of rough sets is proposed which uses significance of principle component variables as heuristic function. Finally, a minimum reduction of the decision attributes is obtained for follow-up mining. Through applying the RSPCA for data preprocessing of customer churn prediction in a business enterprise, the experiment results suggested that the method has superiority on disposing large quantity and multidimensional attribute of data.
引用
收藏
页码:132 / 140
页数:8
相关论文
共 50 条
  • [21] Principal Component Analysis on the Philippine Health Data
    Carillo, M.
    Largo, F.
    Ceballos, R.
    [J]. INTERNATIONAL JOURNAL OF ECOLOGICAL ECONOMICS & STATISTICS, 2018, 39 (03) : 91 - 97
  • [22] Probabilistic principal component analysis for metabolomic data
    Gift Nyamundanda
    Lorraine Brennan
    Isobel Claire Gormley
    [J]. BMC Bioinformatics, 11
  • [23] Principal component analysis for compositional data with outliers
    Filzmoser, Peter
    Hron, Karel
    Reimann, Clemens
    [J]. ENVIRONMETRICS, 2009, 20 (06) : 621 - 632
  • [24] Quantum data compression by principal component analysis
    Chao-Hua Yu
    Fei Gao
    Song Lin
    Jingbo Wang
    [J]. Quantum Information Processing, 2019, 18
  • [25] Data analysis using the maximum probabilistic rough set in the R environment
    Debnath, Kalyani
    [J]. INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2024, 19 (04) : 355 - 365
  • [26] Indiscernibility Relations by Interrelationships between Attributes in Rough Set Data Analysis
    Kudo, Yasuo
    Murai, Tetsuya
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2012), 2012, : 220 - 225
  • [27] SPSS Syntax for Combining Results of Principal Component Analysis of Multiply Imputed Data Sets using Generalized Procrustes Analysis
    van Wingerde, Bart
    van Ginkel, Joost
    [J]. APPLIED PSYCHOLOGICAL MEASUREMENT, 2021, 45 (03) : 231 - 232
  • [28] Laser gyro signal filtering by combining CEEMDAN and principal component analysis
    Huang, Rongrong
    Yan, Lei
    Liu, Jing
    [J]. JOURNAL OF VIBROENGINEERING, 2021, 23 (08) : 1820 - 1832
  • [29] Rough Set Analysis of Classification Data with Missing Values
    Szelag, Marcin
    Blaszczynski, Jerzy
    Slowinski, Roman
    [J]. ROUGH SETS, 2017, 10313 : 552 - 565
  • [30] A practical strategy for acquiring rules based on rough sets and principal component analysis
    Zeng, A
    Zheng, QL
    Pan, D
    Peng, H
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 3146 - 3150