In this article, we introduce a reinforcement learning-based price-driven demand response management (DRM) mechanism in smart grid systems consisting of prosumers. Our proposed approach accounts for the prosumers' behavioral characteristics and models the emerging interactions among all the involved actors in the smart grid system, i.e., prosumers, energy management system (EMS), and utility companies. In particular, an off-policy reinforcement learning is introduced enabling the EMS to determine the optimal price that should be announced to the prosumers on an hourly-basis toward minimizing the overall system's cost. In this process, the utility companies' hourly-based wholesale price and the prosumers' energy generation and consumption characteristics are considered as input. At the same time, the prosumers' optimal amount of purchased energy is determined in a real-time manner. The presented numerical results demonstrate the success of the proposed DRM model to deal with the incomplete information availability scenarios, regarding the prosumers' energy selling and purchasing patterns, compared to the state of the art. Also, the detailed comparative evaluation against other price-based DRM approaches, e.g., cap-based and day-ahead pricing, shows the benefits of the proposed DRM model in terms of adapting in a real-time manner to the prosumers' energy demand, while jointly minimizing the overall system's long-term cost.