Despite the success of Deep Reinforcement Learning (DRL) in radio-resource management within multi-cell wireless networks, applying it to power allocation in ultra-dense 5G and beyond networks poses challenges. While existing multi-agent DRL-based methods often adopt a fully centralized approach, they often overlook communication overhead costs. In this paper, we model a multi-cell network as a collaborative multi-agent DRL system, implementing a centralized training-decentralized execution approach for accurate and real-time decision-making, thereby eliminating communication overhead during execution. We carefully design the DRL agents' input observations, actions, and rewards to address potential impractical power allocation policies in multi-carrier systems and ensure strict compliance with transmit power constraints. Through extensive simulations, we assess the sensitivity of the proposed DRL-based power allocation to various exploration methods and system parameters. Results indicate superior performance of DRL-based power allocation with continuous action space in complex network environments. Conversely, simpler network settings with fewer subcarriers and users require fewer power allocation actions, ensuring rapid convergence. By leveraging a fast exploration rate, DRL-based power allocation with discrete action space outperforms conventional algorithms, achieving a 36% relative sum rate increase within 60,000 training episodes.