Distributed Feature Selection for Big Data Using Fuzzy Rough Sets

被引:32
作者
Kong, Linghe [1 ]
Qu, Wenhao [1 ]
Yu, Jiadi [1 ]
Zuo, Hua [2 ]
Chen, Guihai [1 ]
Xiong, Fei [3 ]
Pan, Shirui [4 ]
Lin, Siyu [3 ]
Qiu, Meikang [5 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Com Puter Sci & Engn, Shanghai 200240, Peoples R China
[2] Univ Technol, Fac Engn & Informat Technol, Sydney, NSW 2007, Australia
[3] Beijing Jiaotong Univ, Sch Elect & Informat Engn, Beijing 100044, Peoples R China
[4] Monash Univ, Fac Informat Technol, Clayton, Vic 3800, Australia
[5] Texas A&M Univ, Dept Comp Sci, Commerce, TX 75428 USA
基金
国家重点研发计划;
关键词
Rough sets; Feature extraction; Big Data; Servers; Distributed databases; Cloud computing; Heuristic algorithms; Big data; distributed feature selection; dynamic data decomposition; fuzzy rough sets; ATTRIBUTE REDUCTION; APPROXIMATIONS;
D O I
10.1109/TFUZZ.2019.2955894
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fuzzy rough-set-based feature selection is an important technique for big data analysis. However, the classic fuzzy rough set algorithm takes all the data correlations into account, which leads to the centralized computing mode, requiring high computing and memory space resources. With the increasing amount of data in the big data era, the centralized server cannot afford the computation of fuzzy rough set. To enable the fuzzy rough set for big data analysis, in this article, we propose the novel distributed fuzzy rough set (DFRS)-based feature selection, which separates and assigns the tasks to multiple nodes for parallel computing. The key challenge is to maintain the global information on each distributed node without conserving the entire fuzzy relation matrix. We tackle this challenge by a dynamic data decomposition algorithm and a data summarization process on each distributed node. Extensive experiments based on multiple real datasets demonstrate that DFRS significantly improves the runtime, and its feature selection accuracy is nearly the same as the traditional centralized computing.
引用
收藏
页码:846 / 857
页数:12
相关论文
共 34 条
[11]   Incremental fuzzy cluster ensemble learning based on rough set theory [J].
Hu, Jie ;
Li, Tianrui ;
Luo, Chuan ;
Fujita, Harnido ;
Yang, Yan .
KNOWLEDGE-BASED SYSTEMS, 2017, 132 :144-155
[12]   Information-preserving hybrid data reduction based on fuzzy-rough techniques [J].
Hu, QH ;
Yu, DR ;
Xie, ZX .
PATTERN RECOGNITION LETTERS, 2006, 27 (05) :414-423
[13]   Kernelized Fuzzy Rough Sets and Their Applications [J].
Hu, Qinghua ;
Yu, Daren ;
Pedrycz, Witold ;
Chen, Degang .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (11) :1649-1667
[14]   Gaussian kernel based fuzzy rough sets: Model, uncertainty measures and applications [J].
Hu, Qinghua ;
Zhang, Lei ;
Chen, Degang ;
Pedrycz, Witold ;
Yu, Daren .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2010, 51 (04) :453-471
[15]   Fuzzy-rough attribute reduction with application to web categorization [J].
Jensen, R ;
Shen, Q .
FUZZY SETS AND SYSTEMS, 2004, 141 (03) :469-485
[16]   New Approaches to Fuzzy-Rough Feature Selection [J].
Jensen, Richard ;
Shen, Qiang .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2009, 17 (04) :824-838
[17]   An Intelligent Incremental Filtering Feature Selection and Clustering Algorithm for Effective Classification [J].
Kanimozhi, U. ;
Manjula, D. .
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2018, 24 (04) :701-709
[18]   Millimeter-Wave Wireless Communications for IoT-Cloud Supported Autonomous Vehicles: Overview, Design, and Challenges [J].
Kong, Linghe ;
Khan, Muhammad Khurram ;
Wu, Fan ;
Chen, Guihai ;
Zeng, Peng .
IEEE COMMUNICATIONS MAGAZINE, 2017, 55 (01) :62-68
[19]  
Kong LH, 2016, IEEE COMMUN MAG, V54, P53, DOI 10.1109/MCOM.2016.7588229
[20]   Data Loss and Reconstruction in Wireless Sensor Networks [J].
Kong, Linghe ;
Xia, Mingyuan ;
Liu, Xiao-Yang ;
Chen, Guangshuo ;
Gu, Yu ;
Wu, Min-You ;
Liu, Xue .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (11) :2818-2828