Evolutionary algorithm;
Instance selection;
Apache Spark;
Big Data;
DATA REDUCTION;
D O I:
10.1016/j.asoc.2024.111638
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Instance selection is an important preprocessing technology in data mining and machine learning. In this paper, we proposed a novel evolutionary based instance selection algorithm for big data. First, we defined a coarse granularity chromosome structure to reduce the size of search space and costs of chromosome operations (recombination and mutation, etc.). Then a stratified evolution strategy was proposed to remove the hyper parameter in classic fitness function and achieve precise control over the reduction ratio of instances. Finally, a sampling-based fitness function was proposed to reduce the time complexity. Experimental results shown that our new algorithm is efficient to complete the instance selection task on data set with millions of instances in minutes-level. The 10-fold cross-validation also proved that the selection results on many datasets have high nearest neighbor classification accuracy.
机构:
Macau Univ Sci & Technol, Fac Informat Technol, Macau, Macao, Peoples R ChinaMacau Univ Sci & Technol, Fac Informat Technol, Macau, Macao, Peoples R China
Dai, Hong-Ning
;
Wang, Hao
论文数: 0引用数: 0
h-index: 0
机构:
Norwegian Univ Sci & Technol Aalesund, Fac Engn & Nat Sci, Gjovik, NorwayMacau Univ Sci & Technol, Fac Informat Technol, Macau, Macao, Peoples R China
Wang, Hao
;
Xu, Guangquan
论文数: 0引用数: 0
h-index: 0
机构:
Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Adv Networking, Tianjin, Peoples R ChinaMacau Univ Sci & Technol, Fac Informat Technol, Macau, Macao, Peoples R China
Xu, Guangquan
;
Wan, Jiafu
论文数: 0引用数: 0
h-index: 0
机构:
South China Univ Technol, Sch Mech & Automot Engn, Guangzhou, Guangdong, Peoples R ChinaMacau Univ Sci & Technol, Fac Informat Technol, Macau, Macao, Peoples R China
机构:
Macau Univ Sci & Technol, Fac Informat Technol, Macau, Macao, Peoples R ChinaMacau Univ Sci & Technol, Fac Informat Technol, Macau, Macao, Peoples R China
Dai, Hong-Ning
;
Wang, Hao
论文数: 0引用数: 0
h-index: 0
机构:
Norwegian Univ Sci & Technol Aalesund, Fac Engn & Nat Sci, Gjovik, NorwayMacau Univ Sci & Technol, Fac Informat Technol, Macau, Macao, Peoples R China
Wang, Hao
;
Xu, Guangquan
论文数: 0引用数: 0
h-index: 0
机构:
Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Adv Networking, Tianjin, Peoples R ChinaMacau Univ Sci & Technol, Fac Informat Technol, Macau, Macao, Peoples R China
Xu, Guangquan
;
Wan, Jiafu
论文数: 0引用数: 0
h-index: 0
机构:
South China Univ Technol, Sch Mech & Automot Engn, Guangzhou, Guangdong, Peoples R ChinaMacau Univ Sci & Technol, Fac Informat Technol, Macau, Macao, Peoples R China