A Genetic Algorithm Based Modification on the LTS Algorithm for Large Data Sets

被引：1

作者：

Satman, M. Hakan ^{[1
]}

机构：

[1] Istanbul Univ, Dept Econometr, TR-34 Istanbul, Turkey

来源：

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION | 2012年 / 41卷 / 05期

关键词：

C-steps; Genetic algorithms; Least trimmed squares regression; Outliers; Robust regression; TRIMMED SQUARES REGRESSION; OUTLIERS;

D O I：

10.1080/03610918.2011.598989

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

The authors introduce an algorithm for estimating the least trimmed squares (LTS) parameters in large data sets. The algorithm performs a genetic algorithm search to form a basic subset that is unlikely to contain outliers. Rousseeuw and van Driessen (2006) suggested drawing independent basic subsets and iterating C-steps many times to minimize LTS criterion. The authors 'algorithm constructs a genetic algorithm to form a basic subset and iterates C-steps to calculate the cost value of the LTS criterion. Genetic algorithms are successful methods for optimizing nonlinear objective functions but they are slower in many cases. The genetic algorithm configuration in the algorithm can be kept simple because a small number of observations are searched from the data. An R package is prepared to perform Monte Carlo simulations on the algorithm. Simulation results show that the performance of the algorithm is suitable for even large data sets because a small number of trials is always performed.

引用

页码：644 / 652

页数：9

共 50 条

[41] Sorting Algorithm for Medium and Large Data Sets Based on Multi-Level Independent Subarrays
Moghaddam, Kiaksar Shirvani
Moghaddam, Shahriar Shirvani
2021 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORKS AND SATELLITE (COMNETSAT 2021), 2021, : 152 - 156
[42] A GA-based clustering algorithm for large data sets with mixed numeric and categorical values
Li, J
Gao, XB
Jiao, LC
ICCIMA 2003: FIFTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, 2003, : 102 - 107
[43] Research of DNA Sequences Sets Based on the Improved Genetic Algorithm
Xia, Xue
Zhang, Qiang
Wang, Bin
Zhao, Yuxia
JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2012, 9 (07) : 969 - 973
[44] A genetic algorithm based on eigen fuzzy sets for image reconstruction
Di Martino, Ferdinando
Sessa, Salvatore
APPLICATIONS OF FUZZY SETS THEORY, 2007, 4578 : 342 - +
[45] A Valid Clustering Algorithm for High-dimensional Large Data Sets Based on Distributed Method
Guo Xian e
Yan Junmei
PROCEEDINGS OF 2009 INTERNATIONAL WORKSHOP ON INFORMATION SECURITY AND APPLICATION, 2009, : 1 - 6
[46] Genetic algorithm-based design for DNA sequences sets
Zhang, Qiang
Wang, Bin
Zhang, Rui
Xu, Chun-Xia
Jisuanji Xuebao/Chinese Journal of Computers, 2008, 31 (12): : 2193 - 2199
[47] A EM Probabilistic Clustering Algorithm for Large Scale Data Sets based on Partial Constraints Information
Yan S.
Shunlin S.
Yuquan Z.
Advances in Information Sciences and Service Sciences, 2011, 3 (10): : 20 - 29
[48] A GA-based clustering algorithm for large data sets with mixed numeric and categorical values
Li, J
Gao, XB
Jiao, LC
THIRD INTERNATIONAL SYMPOSIUM ON MULTISPECTRAL IMAGE PROCESSING AND PATTERN RECOGNITION, PTS 1 AND 2, 2003, 5286 : 171 - 174
[49] A hybrid algorithm for K-medoid clustering of large data sets
Sheng, WG
Liu, XH
CEC2004: PROCEEDINGS OF THE 2004 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2004, : 77 - 82
[50] New algorithm for computing cube on very large compressed data sets
Wu, Weili
Gao, Hong
Li, Jianzhong
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (12) : 1667 - 1680

← 1 2 3 4 5 →