Recent works on deep belief network (DBNs) have shown that applying large-scale unsupervised feature learning model can dramatically improve the performance of the applications in many fields. Training billions of parameters in these models such as restricted boltzmann machines (RBMs) appears to be computational challenging for modern CPUs. Graphical Processing Units (GPUs) has been employed in many large-scale deep learning models for performance enhancement due to its massively parallel computing capability. Unfortunately, the limited device memory of GPUs imposes a restriction on the size of the model trained on a single GPU. Multi-GPUs approaches, on the other hand, suffer from inefficient communication and economic cost. In this paper, we proposed a novel memory efficient algorithm on single GPU that can train large-scale RBMs without size restriction and preserve the performance gain of GPU parallel computation. Particularly, the experiments demonstrated that our approach used 75% less memory storage at the cost of only 10% performance loss in training large-scale RBMs with billions of parameters.