Forecasting future building operation states provides operators with comprehensive insights, allowing them to understand and optimize the factors influencing various aspects of building performance, including energy consumption. While conventional modeling tools such as EnergyPlus are widely employed to predict the behavior of buildings, they often struggle to capture the full complexity of real-world operational dynamics, as their outputs are greatly affected by the assumptions made during the modeling process and due to the stochasticity associated with real-world building operations. In this regard, this paper investigates the Physics-Informed Deep Spatio-Temporal Graph Neural Network (PISTGNN) Ensemble, which integrates residual learning and physics constraints into an encoder-decoder structured Diffusion Convolutional Recurrent Neural Network (DCRNN), to precisely estimate building operational dynamics 5 minutes in advance. The experimental results demonstrate that the Ensemble model achieved an average improvement of 44.7% in RMSE over the pure data-driven model across seasonal test sets, underscoring its robustness. Moreover, the model's predictions deviate by only 0.78% from the true values in real-world scenarios, highlighting its exceptional accuracy and reliability for practical applications. PINN integration enhances the model's capability to manage compounding errors in data-sparse regions, reducing model uncertainty.