Machine Learning with Distributed Data Management and Process Architecture

被引:0
|
作者
Baysal, Engin [1 ]
Bayilmis, Cuneyt [2 ]
机构
[1] Istanbul Gedik Univ, Gedik Vocat Sch, Istanbul, Turkey
[2] Sakarya Unveristy, Comp & Informat Engn, Sakarya, Turkey
来源
2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK) | 2019年
关键词
big data; big data analytics; machine learning; apache spark; pyspark; logistic regression; yarn;
D O I
10.1109/ubmk.2019.8907073
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the development of technology that takes place more and more every day in our lives, it becomes almost impossible to manage and process the data produced and thus brought about the necessity of storage and analysis. Both the data size and the increase in the variety of data have necessitated the development of new methods in this context. In this study, distributed data management and analysis tools which are developed for data that cannot be processed in traditional regulations have been used. The machine learning application has been developed by using Logistic Regression classification algorithm. The application was implemented with the data set obtained from the sensors using pyspark libraries on the Spark cluster created using the Google Cloud service. And the working environment managed by YARN, has been observed during the implementation of the application.
引用
收藏
页码:53 / 57
页数:5
相关论文
共 50 条
  • [1] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
  • [2] A Scheme for Data Deduplication Using Advance Machine Learning Architecture in Distributed Systems
    Tarun, Sashi
    Batth, Ranbir Singh
    Kaur, Sukhpreet
    2021 INTERNATIONAL CONFERENCE ON COMPUTING SCIENCES (ICCS 2021), 2021, : 53 - 60
  • [3] Distributed Double Machine Learning with a Serverless Architecture
    Kurz, Malte S.
    COMPANION OF THE ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE 2021, 2021, : 27 - 33
  • [4] A Distributed Architecture for Smart Recycling Using Machine Learning
    Ziouzios, Dimitris
    Tsiktsiris, Dimitris
    Baras, Nikolaos
    Dasygenis, Minas
    FUTURE INTERNET, 2020, 12 (09):
  • [5] NeuroCrypt: Machine Learning Over Encrypted Distributed Neuroimaging Data
    Senanayake, Nipuna
    Podschwadt, Robert
    Takabi, Daniel
    Calhoun, Vince D.
    Plis, Sergey M.
    NEUROINFORMATICS, 2022, 20 (01) : 91 - 108
  • [6] Managing Distributed Machine Learning Lifecycle for Healthcare Data in the Cloud
    Zeydan, Engin
    Arslan, Suayb S.
    Liyanage, Madhusanka
    IEEE ACCESS, 2024, 12 : 115750 - 115774
  • [7] Data Driven Credit Risk Management Process: A Machine Learning Approach
    Chen, Mingrui
    Dautais, Yann
    Huang, LiGuo
    Ge, Jidong
    ICSSP'17: PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON SOFTWARE AND SYSTEM PROCESS, 2017, : 109 - 113
  • [8] Petuum: A New Platform for Distributed Machine Learning on Big Data
    Xing, Eric P.
    Ho, Qirong
    Dai, Wei
    Kim, Jin Kyu
    Wei, Jinliang
    Lee, Seunghak
    Zheng, Xun
    Xie, Pengtao
    Kumar, Abhimanu
    Yu, Yaoliang
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1335 - 1344
  • [9] NeuroCrypt: Machine Learning Over Encrypted Distributed Neuroimaging Data
    Nipuna Senanayake
    Robert Podschwadt
    Daniel Takabi
    Vince D. Calhoun
    Sergey M. Plis
    Neuroinformatics, 2022, 20 : 91 - 108
  • [10] Machine Learning in Big Data
    Wang, Lidong
    Alexander, Cheryl Ann
    INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2016, 1 (02) : 52 - 61