A Parallel Matrix-Based Method for Computing Approximations in Incomplete Information Systems

被引：75

作者：

Zhang, Junbo ^{[1
,2
]}

Wong, Jian-Syuan ^{[2
]}

Pan, Yi ^{[2
]}

Li, Tianrui ^{[1
]}

机构：

[1] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu 610031, Peoples R China

[2] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30303 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2015年 / 27卷 / 02期

基金：

美国国家科学基金会;

关键词：

Rough sets; data mining; MapReduce; matrix; incomplete information systems; ROUGH SETS; ATTRIBUTE REDUCTION; MAPREDUCE;

D O I：

10.1109/TKDE.2014.2330821

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the volume of data grows at an unprecedented rate, large-scale data mining and knowledge discovery present a tremendous challenge. Rough set theory, which has been used successfully in solving problems in pattern recognition, machine learning, and data mining, centers around the idea that a set of distinct objects may be approximated via a lower and upper bound. In order to obtain the benefits that rough sets can provide for data mining and related tasks, efficient computation of these approximations is vital. The recently introduced cloud computing model, MapReduce, has gained a lot of attention from the scientific community for its applicability to large-scale data analysis. In previous research, we proposed a MapReduce-based method for computing approximations in parallel, which can efficiently process complete data but fails in the case of missing (incomplete) data. To address this shortcoming, three different parallel matrix-based methods are introduced to process large-scale, incomplete data. All of them are built on MapReduce and implemented on Twister that is a lightweight MapReduce runtime system. The proposed parallel methods are then experimentally shown to be efficient for processing large-scale data.

引用

页码：326 / 339

页数：14

共 36 条

[1]

Amdahl Gene M., 1967, AFIPS 67, DOI [10.1145/1465482.1465560, DOI 10.1145/1465482.1465560]

[2]

[Anonymous], 2010, P 19 ACM INT S HIGH, DOI DOI 10.1145/1851476.1851593

[3]

[Anonymous], 2004, OSDI 04

[4]

[Anonymous], 1998, UCI REPOSITORY MACHI

[5]

Bulirsch R., 2002, Introduction to numerical analysis, V3

[6]

Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137

[7]

Grzymala-Busse JW, 2005, LECT NOTES COMPUT SC, V3700, P58

[8] Data mining and rough set theory [J].

Grzymala-Busse, JW ;

Ziarko, W .

COMMUNICATIONS OF THE ACM, 2000, 43 (04) :108-109

[9] Mars: A MapReduce Framework on Graphics Processors [J].

He, Bingsheng ;

Fang, Wenbin ;

Luo, Qiong ;

Govindaraju, Naga K. ;

Wang, Tuyong .

PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, :260-269

[10] Neighborhood Rough Set Reduction-Based Gene Selection and Prioritization for Gene Expression Profile Analysis and Molecular Cancer Classification [J].

Hou, Mei-Ling ;

Wang, Shu-Lin ;

Li, Xue-Ling ;

Lei, Ying-Ke .

JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2010,

← 1 2 3 4 →