With the high-proportion integration of renewable energy into the power grid, the fast-response capabilities of demand-side flexible resources (DSFRs), such as electric vehicles (EVs) and thermostatic loads, have become critical for frequency stability. However, the diverse dynamic characteristics of heterogeneous resources lead to high modeling complexity. Traditional reinforcement learning methods, which rely on neural networks to approximate value functions, often suffer from training instability and lack the effective quantification of resource regulation costs. To address these challenges, this paper proposes a multi-agent reinforcement learning frequency control method based on a Consistency Model (CM). This model incorporates power, energy, and first-order inertia characteristics to uniformly characterize the response delays and dynamic behaviors of EVs and air conditioners (ACs), providing a reduced-order analytical foundation for large-scale coordinated control. On this basis, a policy gradient controller is designed. By using projected gradient descent, it ensures that control actions satisfy physical boundaries. A reward function including state deviation penalties and regulation costs is constructed, dynamically adjusting penalty factors according to resource states to achieve priority configuration for frequency regulation. Simulations on the IEEE 39-node system demonstrate that the proposed method significantly outperforms traditional approaches in terms of frequency deviation, algorithm training efficiency, and frequency regulation economy.