Dynamic programming offers an exact, general solution method for completely known sequential decision problems, formulated as Markov Decision Processes (MDP), with a finite number of states. Recently, there has been a great amount of interest in the adaptive version of the problem, where the task to be solved is not completely known a priori. In such a case, an agent has to acquire the necessary knowledge through learning, while simultaneously solving the optimal control or decision problem. A large variety of algorithms, variously known as Adaptive Dynamic Programming (ADP) or Reinforcement Learning (RL), has been proposed in the literature. However, almost invariably such algorithms suffer from slow convergence in terms of the number of experiments needed. In this paper Re investigate how the learning speed can be considerably improved by exploiting and combining knowledge accumulated by multiple agents. These agents operate in the same task environment but follow possibly different trajectories. We discuss methods of combining the knowledge structures associated with the multiple agents and different strategies (with varying overheads) for knowledge communication between agents. Results of simulation experiments are also presented to indicate that combining multiple learning agents is a promising direction to improve learning speed. The method also performs significantly better than some of the fastest MDP learning algorithms such as the prioritized sweeping.