Building Performance Simulation (BPS) is a powerful and widely used technique to evaluate building design and operation strategies prior to construction or retrofitting. However, BPS models often have high computational costs, which is particularly limiting for applications that require a significantly large num-ber of simulations, such as building design optimization or uncertainty analyses. To overcome this gap, researchers have turned to surrogate modeling, where a mathematical model, such as a machine learning algorithm, is trained to mimic the performance of a BPS, allowing to test numerous building design/op-eration configurations at low computational costs. Past studies have applied surrogate BPS modeling to predict the impact of building design parameters on energy performance. However, few have considered building operational parameters, such as occupancy, equipment and lighting usage, and thermostat set-points, which significantly impact energy consumption and peak loads, especially in harsh climate con-ditions. This paper presents a unique evaluation and comparison of machine learning algorithms as surrogates to BPS predictions of building performance (energy consumption and peak loads) under differ-ent operational settings. Results indicate that Extreme Gradient Boosting outperformed all other methods predictive accuracy, with R2 values reaching as high as 0.99 for some models. In contrast, linear regres-sion models were the fastest to train and easiest to interpret while still achieving competitive prediction accuracies (R2 values > 0.9). This work provides direct evidence of the machine learning surrogate models' ability to accurately predict building performance under different operational settings. It also offers unique insights into the strengths and weaknesses of white-box and black-box predictive modeling approaches and the effect of dataset size on the results.(c) 2023 Elsevier B.V. All rights reserved.