This research presents a framework for coordinating multiple intelligent agents within a single virtual environment. Coordination is accomplished via a "next available agent" scheme while learning is achieved through the use of the Q-learning and Sarsa temporal difference reinforcement learning algorithms. To assess the effectiveness of each learning algorithm, experiments were conducted that measured an agent's ability to learn tasks in a static and dynamic environment while using both a fixed (FEP) and variable (VEP) epsilon-greedy probability rate. Results show that Sarsa, on average, outperformed Q-learning in almost all experiments. Overall, VEP resulted in higher percentages of successes and optimal successes than FEP, and showed convergence to the optimal policy when measuring the average number of time steps per episode.