A Genetic Algorithm Based Modification on the LTS Algorithm for Large Data Sets

被引：1

作者：

Satman, M. Hakan ^{[1
]}

机构：

[1] Istanbul Univ, Dept Econometr, TR-34 Istanbul, Turkey

来源：

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION | 2012年 / 41卷 / 05期

关键词：

C-steps; Genetic algorithms; Least trimmed squares regression; Outliers; Robust regression; TRIMMED SQUARES REGRESSION; OUTLIERS;

D O I：

10.1080/03610918.2011.598989

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

The authors introduce an algorithm for estimating the least trimmed squares (LTS) parameters in large data sets. The algorithm performs a genetic algorithm search to form a basic subset that is unlikely to contain outliers. Rousseeuw and van Driessen (2006) suggested drawing independent basic subsets and iterating C-steps many times to minimize LTS criterion. The authors 'algorithm constructs a genetic algorithm to form a basic subset and iterates C-steps to calculate the cost value of the LTS criterion. Genetic algorithms are successful methods for optimizing nonlinear objective functions but they are slower in many cases. The genetic algorithm configuration in the algorithm can be kept simple because a small number of observations are searched from the data. An R package is prepared to perform Monte Carlo simulations on the algorithm. Simulation results show that the performance of the algorithm is suitable for even large data sets because a small number of trials is always performed.

引用

页码：644 / 652

页数：9

共 50 条

[1] A genetic algorithm for clustering on very large data sets
Gasvoda, J
Ding, Q
COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2003, : 163 - 167
[2] A Genetic Algorithm Approach for Clustering Large Data Sets
Luchi, Diego
Rodrigues, Alexandre
Varejao, Flavio Miguel
Santos, Willian
2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 570 - 576
[3] A Data Allocation Strategy Algorithm for Large Databases based on Genetic Algorithm
Li, Xiaofeng
Tang, Yuanxin
Gong, Cuicui
COMPUTATIONAL MATERIALS SCIENCE, PTS 1-3, 2011, 268-270 : 898 - +
[4] Computing LTS Regression for Large Data Sets
PETER J. ROUSSEEUW
KATRIEN VAN DRIESSEN
Data Mining and Knowledge Discovery, 2006, 12 : 29 - 45
[5] Computing LTS regression for large data sets
Rousseeuw, PJ
Van Driessen, K
DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 12 (01) : 29 - 45
[6] Handling imbalanced data sets with a modification of Decorate algorithm
Kotsiantis, Sotiris B.
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2008, 33 (2-3) : 91 - 98
[7] FCM-based clustering algorithm ensemble for large data sets
Li, Jie
Gao, Xinbo
Tian, Chunna
FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 559 - 567
[8] DESCRY: A density based clustering algorithm for very large data sets
Angiulli, F
Pizzuti, C
Ruffolo, M
INTELLIGENT DAA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 203 - 210
[9] Knowledge Reduction Algorithm for Rough Sets based on Adaptive Genetic Algorithm
Hou Ruidong
Zhang Xiaohui
Pan Wei
Mao Ning
2008 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-11, 2008, : 5162 - 5166
[10] AN ALGORITHM FOR THE PRINCIPAL COMPONENT ANALYSIS OF LARGE DATA SETS
Halko, Nathan
Martinsson, Per-Gunnar
Shkolnisky, Yoel
Tygert, Mark
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2011, 33 (05): : 2580 - 2594

← 1 2 3 4 5 →