Data Intensive Parallel Feature Selection Method Study

被引:0
作者
Sun, Zhanquan [1 ]
Li, Zhao [2 ]
机构
[1] Shandong Comp Sci Ctr, Shandong Prov Key Lab Comp Network, Jinan 250014, Shandong, Peoples R China
[2] Beijing Jiaotong Univ, Sch Software Engn, Beijing 100044, Peoples R China
来源
PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2014年
关键词
Feature selection; MapReduce; mutual information; contribution degree;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an important research topic in machine learning and pattern recognition. It is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. With the development of computer science, data deluge occurs in many application fields. Classical feature selection method is out of work in processing large-scale dataset because of expensive computational cost. This paper mainly concentrates on the study of data intensive parallel feature selection method. The parallel feature selection method is based on MapReduce program model. In each map node, a novel method is used to calculate the mutual information and combinatory contribution degree is used to determine the number of selected features. In each epoch, selected features of all map nodes are collected to a reduce node and from which a feature is selected through synthesiation. The parallel feature selection method is scalable. The efficiency of the method is illustrated through an example analysis.
引用
收藏
页码:2256 / 2262
页数:7
相关论文
共 17 条
[1]  
[Anonymous], ADV PARALLEL
[2]  
[Anonymous], 2010, P 19 ACM INT S HIGH, DOI DOI 10.1145/1851476.1851593
[3]  
[Anonymous], 2009, ENCY DATABASE SYSTEM
[4]  
[Anonymous], 1998, Feature Extraction, Construction and Selection: A Data Mining Perspective
[5]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[6]  
Bingjing Zhang, 2010, Proceedings of the 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom 2010), P25, DOI 10.1109/CloudCom.2010.37
[7]  
[蔡哲元 Cai Zheyuan], 2010, [模式识别与人工智能, Pattern Recognition and Artificial Intelligence], V23, P235
[8]   Research on collaborative negotiation for e-commerce. [J].
Feng, YQ ;
Lei, Y ;
Li, Y ;
Cao, RZ .
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, :2085-2088
[9]  
Fox G., 2012, International Conference on Parallel and Distributed Processing Techniques and Applications, P495
[10]   Statistical pattern recognition: A review [J].
Jain, AK ;
Duin, RPW ;
Mao, JC .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (01) :4-37