MapReduce based improved quick reduct algorithm with granular refinement using vertical partitioning scheme

被引:13
作者
Sowkuntla, Pandu [1 ]
Prasad, P. S. V. S. Sai [1 ]
机构
[1] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, Telangana, India
关键词
Rough sets; MapReduce; Apache spark; Reduct; Horizontal partitioning; Vertical partitioning; Feature subset selection; ATTRIBUTE REDUCTION; ROUGH;
D O I
10.1016/j.knosys.2019.105104
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the last few decades, rough sets have evolved to become an essential technology for feature subset selection by way of reduct computation in categorical decision systems. In recent years with the proliferation of MapReduce for distributed/parallel algorithms, several scalable reduct computation algorithms have been developed in this field for large-scale decision systems using MapReduce. The existing MapReduce based reduct computation approaches use horizontal partitioning (division in object space) of the dataset into the nodes of the cluster, requiring a complicated shuffle and sort phase. In this work, we propose an algorithm MR_IQRA_VP which is designed using vertical partitioning (division in attribute space) of the dataset with a simplified shuffle and sort phase of the MapReduce framework. MR_IQRA_VP is a distributed/parallel implementation of the Improved Quick Reduct Algorithm (IQRA_IG) and is implemented using iterative MapReduce framework of Apache Spark. We have done an extensive comparative study through experimentation on benchmark decision systems using existing horizontal partitioning based reduct computation algorithms. Through experimental analysis, along with theoretical validation, we have established that MR_IQRA_VP is suitable and scalable to datasets of larger size attribute space and moderate object space prevalent in the areas of Bioinformatics and Web mining. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 44 条
  • [1] Anaraki JR, 2013, 2013 5TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), P301, DOI 10.1109/IKT.2013.6620083
  • [2] [Anonymous], 2010, P 19 ACM INT S HIGH, DOI DOI 10.1145/1851476.1851593
  • [3] [Anonymous], [No title captured]
  • [4] [Anonymous], [No title captured]
  • [5] [Anonymous], 2014, DATA EXPLOSION 2014
  • [6] [Anonymous], 2016, P 3 IKDD C DAT SCI
  • [7] A Fast Heuristic Attribute Reduction Algorithm using Spark
    Chen, Mincheng
    Yuan, Jingling
    Li, Lin
    Liu, Dongling
    Li, Tao
    [J]. 2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 2393 - 2398
  • [8] The incremental method for fast computing the rough fuzzy approximations
    Cheng, Yi
    [J]. DATA & KNOWLEDGE ENGINEERING, 2011, 70 (01) : 84 - 100
  • [9] Rough set-aided keyword reduction for text categorization
    Chouchoulas, A
    Shen, Q
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2001, 15 (09) : 843 - 873
  • [10] Attribute Reduction Based on MapReduce Model and Discernibility Measure
    Czolombitko, Michal
    Stepaniuk, Jaroslaw
    [J]. COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2016, 2016, 9842 : 55 - 66