Urban rail transit train operations are frequently disturbed by external factors, making schedule adherence challenging. These disturbances can cause cascading train delays, leaving many passengers stranded and compromising system safety, stability, and efficiency. Furthermore, increasing energy consumption concerns in train operations necessitates the incorporation of energy-saving considerations in formulating train regulation strategies. To address the practical concern, this paper introduces an innovative automatic train regulation learning environment, which models the train traffic dynamic, passenger flow, and energy consumption within a Markov decision process. In automatic train regulation learning environment, the automatic train regulation system functions as the learning agent interacting with the environment. A novel multi-step look ahead deep deterministic policy gradient algorithm is designed for the agent to decide real-time regulation strategies. This algorithm, enabling the anticipation of future states, ensures robust and reliable decision- making for train operations under disturbed conditions. Real-world experiments including comparisons with other commonly-used automatic train regulation strategies demonstrates the effectiveness of the proposed method. The generated regulation strategy shows significant improvements in delay reduction, passenger service quality, and energy savings. Actually, the proposed methodology not only enhances urban rail transit system performance but also extends deep reinforcement learning applications in intelligent transportation, offering an innovative and promising solution to transportation operational challenges.