Creating precise groundwater level (GWL) prediction models is of crucial significance for the productive use, extended planning, and controlling of limited sub-surface water supplies. In this research, the accuracy of GWL forecasts in Bangladesh was enhanced for three weeks by utilizing ensembles of Machine Learning (ML) models. Six advanced ML-based models were developed and assessed using eight performance indices, and an Overall Ranking (OR) was provided by combining the rankings produced by Grey Relational Analysis (GRA), Variation Coefficient (COV), and Shannon's Entropy (SE). The standalone forecasting models demonstrated excellent performance across the three forecasting horizons, with accuracy values ranging from 0.986 to 0.997 for onestep, 0.971 to 0.999 for two-step, and 0.960 to 0.997 for three-step forecasts at GT3330001. Results also revealed that three ranking techniques (SE, COV, and GRA), as well as their combined ranking (OR), produced different best-performing models at different prediction horizons for different observation wells. Weighted average ensembles of the prediction models were developed by calculating individual model weights using four 0.972, MAE = 0.062 m, and RMSE = 0.123 m for one-step-ahead forecasts at GT3330001. The findings exhibit a consistent trend across other forecasting horizons and observation wells. Finally, the Dempster-Shafer evidence theory was employed to rank the single and composite models. The ranking results demonstrated that the BMAbased ensemble consistently secured the top position (with the weight values of 0.997, 0.991, and 0.987 for oneweek, two-weeks, and three-weeks forward forecasts at GT3330001) for all forecasting horizons and observation wells. This study shows that the BMA-based composite model can produce more accurate GWL projections at Bangladesh study location, with potential for application in other regions worldwide.