Big data and machine learning framework for clouds and its usage for text classification

被引:9
作者
Pintye, Istvan [1 ]
Kail, Eszter [1 ]
Kacsuk, Peter [1 ,2 ]
Lovas, Robert [1 ,3 ]
机构
[1] Inst Comp Sci & Control SZTAKI, Kende U 13-17, H-1111 Budapest, Hungary
[2] Univ Westminster, Ctr Parallel Comp, London, England
[3] Obuda Univ, John von Neumann Fac Informat, Budapest, Hungary
基金
匈牙利科学研究基金会; 欧盟地平线“2020”;
关键词
big data; cloud; machine learning; parallel and distributed execution; reference architectures; text classification; PLATFORM; SYSTEM;
D O I
10.1002/cpe.6164
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Reference architectures for big data and machine learning include not only interconnected building blocks but important considerations (among others) for scalability, manageability and usability issues as well. Leveraging on such reference architectures, the automated deployment of distributed toolsets and frameworks on various clouds is still challenging due to the diversity of technologies and protocols. The paper focuses particularly on the widespread Apache Spark cluster with Jupyter as the particularly addressed framework, and the Occopus cloud-agnostic orchestrator tool for automating its deployment and maintenance stages. The presented approach has been demonstrated and validated with a new, promising text classification application on the Hungarian academic research infrastructure, the OpenStack-based MTA Cloud. The paper explains the concept, the applied components, and illustrates their usage with real use-case measurements.
引用
收藏
页数:14
相关论文
共 35 条
[21]  
Lui K, 2018, INFRASTRUCTURE REFER
[22]  
Mikolov T., 2013, 1 INT C LEARN REPR I, DOI DOI 10.48550/ARXIV.1301.3781
[23]   Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey [J].
Nguyen, Giang ;
Dlugolinsky, Stefan ;
Bobak, Martin ;
Viet Tran ;
Lopez Garcia, Alvaro ;
Heredia, Ignacio ;
Malik, Peter ;
Hluchy, Ladislav .
ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (01) :77-124
[24]   Reference Architecture and Classification of Technologies, Products and Services for Big Data Systems [J].
Paakkonen, Pekka ;
Pakkala, Daniel .
BIG DATA RESEARCH, 2015, 2 (04) :166-186
[25]  
Paice C. D., 1990, SIGIR Forum, V24, P56, DOI 10.1145/101306.101310
[26]  
Perkel JM, 2018, NATURE, V563, P145, DOI 10.1038/d41586-018-07196-1
[27]  
Pintye I, 2019, P 11 INT WORKSH SCI
[28]  
Pop D, 2016, DATA SCI BIG DATA CO, P139, DOI DOI 10.1007/978-3-319-31861-5_7
[29]  
Pranckeviius T, 2016, P 2016 IEEE 4 WORKSH, P1
[30]  
Le Q, 2014, PR MACH LEARN RES, V32, P1188