Performance Improvement Algorithms in Big Data Analysis

被引:0
作者
Metsker, Oleg [1 ]
Efimov, Egor [1 ]
Trofimov, Egor [3 ]
Kopanitsa, Georgy [2 ]
Bolgova, Ekaterina [2 ]
Yakovlev, Alexey [1 ]
机构
[1] Almazov Natl Med Res Ctr, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] All Russian State Univ Justice, Moscow, Russia
来源
9TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE IN COMPUTATIONAL SCIENCE, YSC2020 | 2020年 / 178卷
关键词
Legal tech; performance improvement; CUDA; machine learning;
D O I
10.1016/j.procs.2020.11.040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the study results of development methods and algorithms for performance improvement in the big data analysis and training of machine learning models. The data growing and complexity problem describing complex processes require new approaches and tools for the scientific and technical community. In the course of research, the algorithms for analysis of heterogeneous medical and law records were performed. The performance improvement in classification, clustering and graph calculation problems were solved. With using CUDA it was possible to get more than 95 times performance. The usage of high-performance technologies is important in the analysis of electronic records because it provides an adequate response to the process of analysis of large scale of data from information systems. This study shows how to speed up the calculations on the example of most basic and widespread machine learning tasks. The results of the study can be used to develop a new generation of decision support systems, interactive data analysis systems and methods for eScience (c) 2020 The Authors. Published by ELSEVIER B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of the 9th International Young Scientist Conference on Computational Science
引用
收藏
页码:386 / 393
页数:8
相关论文
共 21 条
[1]  
Chan DM, 2018, INT SYM COMP ARCHIT, P330, DOI [10.1109/CAHPC.2018.8645912, 10.1109/SBAC-PAD.2018.00060]
[2]   Semi-automated cancer genome analysis using high-performance computing [J].
Crispatzu, Giuliano ;
Kulkarni, Pranav ;
Toliat, Mohammad R. ;
Nuernberg, Peter ;
Herling, Marco ;
Herling, Carmen D. ;
Frommolt, Peter .
HUMAN MUTATION, 2017, 38 (10) :1325-1335
[3]  
Dongarra J, 2015, COMPUT COMMUN NETW S, P3, DOI 10.1007/978-3-319-20943-2_1
[4]   Large-scale machine learning based on functional networks for biomedical big data with high performance computing platforms [J].
Elsebakhi, Emad ;
Lee, Frank ;
Schendel, Eric ;
Haque, Anwar ;
Kathireason, Nagarajan ;
Pathare, Tushar ;
Syed, Najeeb ;
Al-Ali, Rashid .
JOURNAL OF COMPUTATIONAL SCIENCE, 2015, 11 :69-81
[5]  
Garland M., 2010, P IEEE INT S PAR DIS, DOI [10.1109/IPDPS.2010.5470378, DOI 10.1109/IPDPS.2010.5470378]
[6]  
Kim W., 2009, PARALLEL CLUSTERING, DOI [10.1016/0167-8191(89)90036-7, DOI 10.1016/0167-8191(89)90036-7]
[7]  
Kutyrev K., 2019, MORTALITY PREDICTION
[8]  
Metsker O., 2019, RUSSIAN COURT DECISI
[9]   Modelling and Analysis of Complex Patient-Treatment Process Using GraphMiner Toolbox [J].
Metsker, Oleg ;
Kesarev, Sergey ;
Bolgova, Ekaterina ;
Golubev, Kirill ;
Karsakov, Andrey ;
Yakovlev, Alexey ;
Kovalchuk, Sergey .
COMPUTATIONAL SCIENCE - ICCS 2019, PT V, 2019, 11540 :674-680
[10]   Pattern-based Mining in Electronic Health Records for Complex Clinical Process Analysis [J].
Metsker, Oleg ;
Bolgova, Ekaterina ;
Yakovlev, Alexey ;
Funkner, Anastasia ;
Kovalchuk, Sergey .
6TH INTERNATIONAL YOUNG SCIENTIST CONFERENCE ON COMPUTATIONAL SCIENCE, YSC 2017, 2017, 119 :197-206