Spark Based Distributed Deep Learning Framework For Big Data Applications

被引:0
作者
Khumoyun, Akhmedov [1 ]
Cui, Yun [1 ]
Hanku, Lee [1 ]
机构
[1] Konkuk Univ, Dept Internet & Multimedia Engn, Seoul, South Korea
来源
2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMMUNICATIONS TECHNOLOGIES (ICISCT) | 2016年
关键词
Distributed Computing; Deep Learning; Big Data; Spark; HDFS;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep Learning architectures, such as deep neural networks, are currently the hottest emerging areas of data science, especially in Big Data. Deep Learning could be effectively exploited to address some major issues of Big Data, including withdrawing complex patterns from huge volumes of data, fast information retrieval, data classification, semantic indexing and so on. In this work, we designed and implemented a framework to train deep neural networks using Spark, fast and general data flow engine for large scale data processing. The design is similar to Google software framework called DistBelief which can utilize computing clusters with thousands of machines to train large scale deep networks. Training Deep Learning models requires extensive data and computation. Our proposed framework can accelerate the training time by distributing the model replicas, via stochastic gradient descent, among cluster nodes for data resided on HDFS.
引用
收藏
页数:5
相关论文
共 11 条
[1]  
[Anonymous], 2013, FRONT MASS DAT AN
[2]  
[Anonymous], 2012, ADV NEURAL INF PROCE
[3]  
[Anonymous], 2012, IEEE T AUDIO SPEECH
[4]  
[Anonymous], 2016, International Journal of Computer Applications
[5]  
[Anonymous], DEEP BIG SIMPLE NEUR
[6]   A neural probabilistic language model [J].
Bengio, Y ;
Ducharme, R ;
Vincent, P ;
Jauvin, C .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1137-1155
[7]  
Coates A., 2011, AISTATS 14
[8]  
Ghoting A, 2011, PROC INT CONF DATA, P231, DOI 10.1109/ICDE.2011.5767930
[9]  
Gillick Dan., 2006, MAPREDUCE DISTRIBUTE
[10]  
Moritz Philipp, 2015, SPARKKNET TRAINING D