Massive MIMO (mMIMO) technology is considered as a key enabler for 5G and beyond cellular networks, which allows formation of highly directional radiation beams in the millimeter-wave (mmWave) band. Specifically, considering the 5G new radio (NR) standard, a codebook-based approach is used that allows setting the antenna weights, so that both transmission and reception can be achieved in the desired angle. However, when a fixed codebook is used, these angular directions may not be exactly aligned along the optimal path that maximizes the SINR between the transmitter-receiver pair, depending on the granularity of the beam and the codebook size. To address these issues, we propose selection of the analog parameters of the transceiver chain through Deep Reinforcement Learning (DRL). Simulation results show that our approach allows fine-grained beam refinement to the coarse initial estimates of Angle-ofArrival and Angle-of-Departures in mmWave Frequency Range 2 (FR2) for the 5G NR standard obtained during the a reduced initial beam establishment procedure (P-1). We observe our approach consistently improves the Reference Signal Received Power (RSRP) perceived at the UE side up to 15% while allowing a reduction in the number of Synchronization Signal Blocks (SSBs) up to a factor of x64 compared to the equivalent number used in P-1 to obtain comparable steering accuracy. Finally, once the trained DRL agent is implemented, it eliminates 100% of control signals needed for the beam refinement procedures, namely P-2 for transmitter beam refinement and P-3 for receiver.