Software defect prediction (SDP) is crucial in software engineering, as undetected defects can lead to significant quality issues, increased maintenance costs, and potential project delays. By accurately identifying defect-prone areas within software systems, SDP helps mitigate these risks, ensuring more reliable software and reducing overall development costs. Numerous studies have aimed to predict defects, primarily by developing machine learning (ML) and deep learning (DL) models. However, these efforts have often overlooked critical aspects such as optimal feature selection, hyperparameter tuning and complex patterns within the data. To address these limitations, this study proposes a novel model based on a Stacked Long Short-Term Memory (LSTM) network with an attention mechanism, designed to enhance the predictive capabilities for software defect prediction (SDP). Feature selection is optimized using the Krill Herd algorithm, while hyperparameter tuning is efficiently managed through the Tree-Structured Parzen Estimator technique. To address data imbalance, Synthetic Minority Over-sampling Technique (SMOTE) is employed, ensuring balanced training datasets. The Stacked LSTM model architecture is designed to capture complex patterns within the data, enhancing the effectiveness of software defect prediction by leveraging deeper insights from sequential information. The model's performance is evaluated using 12 NASA datasets and 38 Apache Promise datasets. The performance of the proposed model is evaluated using several metrics, including the Area Under the ROC Curve (AUC), F-measure, Recall, and Matthews Correlation Coefficient (MCC). When validated over 50 datasets, the proposed model depicted AUC value in the range of 0.829-0.999, MCC in the range of 0.534-0.988, F-Measure in the range of 0.753-0.994 and Recall in the range of 0.734-0.99. The proposed model is also compared against state-of-the-art models from existing studies and recorded highest mean MCC value of 0.879 and mean AUC value of 0.971. The statistical significance of our results, in comparison to these studies, is confirmed using the Scott-Knott test. The findings suggest that this approach is highly effective for SDP, offering superior accuracy and reliability compared to other models.