K-MLIO: Enabling K-Means for Large Data-sets and Memory Constrained Embedded Systems

被引：3

作者：

Slimani, Camelia ^{[1
]}

Rubini, Stephane ^{[1
]}

Boukhobza, Jalil ^{[1
]}

机构：

[1] Univ Brest, Lab STICC, CNRS, UMR 6285, F-29200 Brest, France

来源：

2019 IEEE 27TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2019) | 2019年

关键词：

K-means; I/O optimization; embedded systems; machine learning;

D O I：

10.1109/MASCOTS.2019.00037

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Machine Learning (ML) algorithms are increasingly used in embedded systems to perform different tasks such as clustering and pattern recognition. These algorithms are both compute and memory intensive whilst embedded devices offer lower hardware capabilities as compared to traditional ML platforms. K-means clustering is one of the widely used ML algorithms. In the case of large data-sets, our analysis showed that on average, more than 70% of the execution time is spent on I/Os. In this paper, we present a version of K-means that drastically reduces the number of I/Os by spanning the data-set only once as compared to the traditional version that reads it several times according to the number of iterations performed. Our evaluation showed that the proposed strategy reduces the overall execution time on large data-sets by 60% on average while lowering the number I/Os operations by 90% with a comparable precision to the traditional K-means implementation.

引用

页码：262 / 268

页数：7

共 10 条

[1] Genetic Sampling k-means for Clustering Large Data Sets
Luchi, Diego
Santos, Willian
Rodrigues, Alexandre
Varejao, Flavio Miguel
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 691 - 698
[2] Zoning by k-means over a large data set
Martinez, Carlos
Lozano, Jesus
de la Fuente, David
Priore, Paolo
Garcia, Nazario
2013 12TH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (MICAI 2013), 2013, : 65 - 69
[3] A novel K-means hierarchical clustering algorithm for efficient information extraction from large data sets
Shahapurkar, SS
Sundareshan, MK
IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 390 - 396
[4] Constrained k-means on cluster proportion and distances among clusters for longitudinal data analysis
Usami, Satoshi
JAPANESE PSYCHOLOGICAL RESEARCH, 2014, 56 (04) : 361 - 372
[5] Distance Constrained Data Clustering by Combined k-means Algorithms and Opinion Dynamics Filters
Oliva, Gabriele
La Manna, Damiano
Fagiolini, Adriano
Setola, Roberto
2014 22ND MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2014, : 612 - 619
[6] Optimizing OpenCL Code for Performance on FPGA: k-Means Case Study With Integer Data Sets
Paulino, Nuno
Ferreira, Joao Canas
Cardoso, Joao M. P.
IEEE ACCESS, 2020, 8 : 152286 - 152304
[7] Data-driven prognostics using a combination of constrained K-means clustering, fuzzy modeling and LOF-based score
Diez-Olivan, Alberto
Pagan, Jose A.
Sanz, Ricardo
Sierra, Basilio
NEUROCOMPUTING, 2017, 241 : 97 - 107
[8] Multi-Agents Approach for Data Mining Based k-Means for Improving the Decision Process in the ERP Systems
Mesbahi, Nadjib
Kazar, Okba
Benharzallah, Saber
Zoubeidi, Merouane
INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2015, 7 (02) : 1 - 14
[9] Knowledge acquisition from in-operation data for water supply systems using pls regression and k-means method
Matsuki, Hiroshi
Fujimoto, Yasutaka
IEEJ Transactions on Industry Applications, 2014, 134 (03) : 301 - 307
[10] Anomaly detection using K-Means and long-short term memory for predictive maintenance of large-scale solar (LSS) photovoltaic plant
Zulfauzi, Irfan Adam
Dahlan, Nofri Yenita
Sintuya, Hathaithip
Setthapun, Worajit
ENERGY REPORTS, 2023, 9 : 154 - 158

← 1 →