The increasing demand for uninterrupted connectivity emphasises the pivotal role of Unmanned Aerial Vehicles (UAVs) in facilitating real-time video streaming, despite the challenges associated with highly dynamic air-to-ground communications. Deep Reinforcement Learning (DRL)-based solutions (on-policy) are designed to optimize specific quality of experience (QoE) objectives, such as video quality and smoothness when networks fluctuate. However, they are vulnerable to different hyperparameters and have poor sample efficiency. To overcome this problem, we propose an improved off-policy soft actor-critic (SAC) solution, named I-SAC, which provides an exceptional exploration-exploitation trade-off for UAV-based aerial video streaming. I-SAC trains a neural network by jointly considering the video playback status, UAV flight metrics like altitude, velocity, and acceleration, as well as prior network conditions with the goal of maximizing the overall QoE. We design a new QoE metric that considers video quality, video quality oscillations, re-buffering, latency, and bandwidth utilization. We evaluate I-SAC with extensive real-world bandwidth settings, UAV flights, and multi-duration segment datasets. The trace-driven simulation results demonstrate that I-SAC significantly outperforms the closest on-policy and off-policy DRL-based alternative solutions in terms of QoE. Specifically, I-SAC achieves average QoE improvements of up to 54.32% under different testing scenarios.