Deep learning algorithms have received significant attention in the last few years. Their popularity is due to their ability to achieve higher accuracy than conventional machine learning in many research areas such as speech recognition, image processing and natural language processing. Deep learning algorithms rely on multiple cascaded layers of non-linear processing units, typically composed of hidden artificial neural networks for feature extraction and transformation. However, deep learning algorithms require a large amount of computational power and significant amount of time to train. Fortunately, the training and inference algorithms of deep learning architectures expose abundant data-parallelism. We aim in this work to develop technology that exploits deep learning data parallelism in 2 ways: 1) by distributing deep computation into a Hadoop cluster or cloud of computing nodes, and 2) by using field programmable gate arrays (FPGA) hardware acceleration to speed up computationally intensive deep learning kernels. In this paper, we describe a hardware prototype of our accelerated Hadoop deep learning system architecture and report initial performance and energy reduction results. By accelerating the convolutional layers of deep learning Convolutional Neural Network, we have observed a potential speed-up of 12.6 times and an energy reduction of 87.5% on a 6-node FPGA accelerated Hadoop cluster.