Time-Series Forecasting (TSF) is a growing research area across various domains including manufacturing. Manufacturing can benefit from Artificial Intelligence (AI) and Machine Learning (ML) innovations for TSF tasks. Although numerous TSF algorithms have been developed and proposed over the past decades, the critical validation and experimental evaluation of the algorithms hold substantial value for researchers and practitioners and are missing to date. This study aims to fill this research gap by providing a rigorous experimental evaluation of the state-of-the-art TSF algorithms on thirteen manufacturing-related datasets with a focus on their applicability in smart manufacturing environments. Each algorithm was selected based on the defined TSF categories to ensure a representative set of state-of-the-art algorithms. The evaluation includes different scenarios to evaluate the models using combinations of two problem categories (univariate and multivariate) and two forecasting horizons (short- and long-term). To evaluate the performance of the algorithms, the weighted average percent error was calculated for each application, and additional post hoc statistical analyses were conducted to assess the significance of observed differences. Only algorithms with accessible codes from open-source libraries were utilized, and no hyperparameter tuning was conducted. This approach allowed us to evaluate the algorithms as "out-of-the-box" solutions that can be easily implemented, ensuring their usability within the manufacturing sector by practitioners with limited technical knowledge of ML algorithms. This aligns with the objective of facilitating the adoption of these techniques in Industry 4.0 and smart manufacturing systems. Based on the results, transformer- and MLP-based architectures demonstrated the best performance across different scenarios with MLP-based architecture winning the most scenarios. For univariate TSF, PatchTST emerged as the most robust algorithm, particularly for long-term horizons, while for multivariate problems, MLP-based architectures like N-HITS and TiDE showed superior results. The study revealed that simpler algorithms like XGBoost could outperform more complex transformer-based in certain tasks. These findings challenge the assumption that more sophisticated models inherently produce better results. Additionally, the research highlighted the importance of computational resource considerations, showing significant variations in runtime and memory usage across different algorithms.