Big data and machine learning framework for clouds and its usage for text classification

被引:9
作者
Pintye, Istvan [1 ]
Kail, Eszter [1 ]
Kacsuk, Peter [1 ,2 ]
Lovas, Robert [1 ,3 ]
机构
[1] Inst Comp Sci & Control SZTAKI, Kende U 13-17, H-1111 Budapest, Hungary
[2] Univ Westminster, Ctr Parallel Comp, London, England
[3] Obuda Univ, John von Neumann Fac Informat, Budapest, Hungary
基金
匈牙利科学研究基金会; 欧盟地平线“2020”;
关键词
big data; cloud; machine learning; parallel and distributed execution; reference architectures; text classification; PLATFORM; SYSTEM;
D O I
10.1002/cpe.6164
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Reference architectures for big data and machine learning include not only interconnected building blocks but important considerations (among others) for scalability, manageability and usability issues as well. Leveraging on such reference architectures, the automated deployment of distributed toolsets and frameworks on various clouds is still challenging due to the diversity of technologies and protocols. The paper focuses particularly on the widespread Apache Spark cluster with Jupyter as the particularly addressed framework, and the Occopus cloud-agnostic orchestrator tool for automating its deployment and maintenance stages. The presented approach has been demonstrated and validated with a new, promising text classification application on the Hungarian academic research infrastructure, the OpenStack-based MTA Cloud. The paper explains the concept, the applied components, and illustrates their usage with real use-case measurements.
引用
收藏
页数:14
相关论文
共 35 条
  • [1] A-Gumaei K, 2019, IEEE INT C EMERG, P1155, DOI [10.1109/etfa.2019.8869075, 10.1109/ETFA.2019.8869075]
  • [2] Albaugh Q., 2014, P 7 ANN COMP AG PROJ, P12
  • [3] [Anonymous], 2020, HUNGARIAN COMP AGEND
  • [4] [Anonymous], 2018, CISCO VISUAL NETWORK
  • [5] Borthakur Dhruba, 2008, HADOOP APACHE PROJEC, V53, P2
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Using Supervised Machine Learning to Code Policy Issues: Can Classifiers Generalize across Contexts?
    Burscher, Bjorn
    Vliegenthart, Rens
    de Vreese, Claes H.
    [J]. ANNALS OF THE AMERICAN ACADEMY OF POLITICAL AND SOCIAL SCIENCE, 2015, 659 (01) : 122 - 131
  • [8] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [9] Evaluation of the dimensions of the spherical model of vocational interests in the long and short version of the Personal Globe Inventory
    Etzel, Julian M.
    Nagy, Gabriel
    [J]. JOURNAL OF VOCATIONAL BEHAVIOR, 2019, 112 : 1 - 16
  • [10] The EGI Federated Cloud e-Infrastructure
    Fernandez-del-Castillo, Enol
    Scardaci, Diego
    Lopez Garcia, Alvaro
    [J]. 1ST INTERNATIONAL CONFERENCE ON CLOUD FORWARD: FROM DISTRIBUTED TO COMPLETE COMPUTING, 2015, 68 : 196 - 205