Machine Learning with Distributed Data Management and Process Architecture

被引:0
作者
Baysal, Engin [1 ]
Bayilmis, Cuneyt [2 ]
机构
[1] Istanbul Gedik Univ, Gedik Vocat Sch, Istanbul, Turkey
[2] Sakarya Unveristy, Comp & Informat Engn, Sakarya, Turkey
来源
2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK) | 2019年
关键词
big data; big data analytics; machine learning; apache spark; pyspark; logistic regression; yarn;
D O I
10.1109/ubmk.2019.8907073
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the development of technology that takes place more and more every day in our lives, it becomes almost impossible to manage and process the data produced and thus brought about the necessity of storage and analysis. Both the data size and the increase in the variety of data have necessitated the development of new methods in this context. In this study, distributed data management and analysis tools which are developed for data that cannot be processed in traditional regulations have been used. The machine learning application has been developed by using Logistic Regression classification algorithm. The application was implemented with the data set obtained from the sensors using pyspark libraries on the Spark cluster created using the Google Cloud service. And the working environment managed by YARN, has been observed during the implementation of the application.
引用
收藏
页码:53 / 57
页数:5
相关论文
共 50 条
[31]   Survey on Data Management Technology for Machine Learning [J].
Cui J.-W. ;
Zhao Z. ;
Du X.-Y. .
Ruan Jian Xue Bao/Journal of Software, 2021, 32 (03) :604-621
[32]   Design of a machine learning-based process capability management system to control dispersion and bias of process data [J].
You, Sun-young ;
Choi, Young-Hwan ;
Lee, Jooyeoun .
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2025, 138 (11-12) :5721-5734
[33]   A MACHINE LEARNING APPROACH FOR DATA QUALITY CONTROL OF EARTH OBSERVATION DATA MANAGEMENT SYSTEM [J].
Hau, Weiguo ;
Jochum, Matthew .
IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, :3101-3103
[34]   Scalable malware detection system using big data and distributed machine learning approach [J].
Manish Kumar .
Soft Computing, 2022, 26 :3987-4003
[35]   Scalable malware detection system using big data and distributed machine learning approach [J].
Kumar, Manish .
SOFT COMPUTING, 2022, 26 (08) :3987-4003
[36]   Machine Learning Solutions for Investigating Streams Data using Distributed Frameworks: Literature Review [J].
Kumar, Kunal ;
Sharma, Neeraj Anand ;
Ali, A. B. M. Shawkat .
2021 IEEE ASIA-PACIFIC CONFERENCE ON COMPUTER SCIENCE AND DATA ENGINEERING (CSDE), 2021,
[37]   Automated Trading with Machine Learning on Big Data [J].
Ruta, Dymitr .
2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, :824-830
[38]   On the Fundamental Limits of Coded Data Shuffling for Distributed Machine Learning [J].
Elmahdy, Adel ;
Mohajer, Soheil .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2020, 66 (05) :3098-3131
[39]   Big Data Analytics in Healthcare Using Machine Learning Algorithms: A Comparative Study [J].
Akundi, Sai Hanuman ;
Soujanya, R. ;
Madhuri, P. M. .
INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2020, 16 (13) :19-32
[40]   Handling missing data for construction waste management: machine learning based on aggregated waste generation behaviors [J].
Yang, Zhongze ;
Xue, Fan ;
Lu, Weisheng .
RESOURCES CONSERVATION AND RECYCLING, 2021, 175