Creep life is a critical parameter affecting service life and mechanical properties of metallic materials. Thus, accurately predicting creep life is of substantial practical significance. However, traditional creep life prediction models often fail to fully consider the influencing factors and lack universality. Artificial intelligence algorithms can directly establish correlations between material composition, processing, and performance, without considering physical mechanism. However, the absence of physical mechanisms may be irrational, and insufficient to effectively guide new alloy design. Therefore, an evaluation system that can comprehensively measure both the mathematical and physical significance of prediction models is required. In this study, a dataset comprising 704 creep samples of heat-resistant steel was collected. A variety of machine learning and deep learning algorithms were employed to construct creep life prediction models. The prediction accuracy of these models was comprehensively evaluated, along with their generalization capabilities when applied to unseen data. Moreover, physical metallurgy theory was introduced to guide the selection of the most suitable models. After a balanced consideration of prediction accuracy, generalization ability, and physical rationality, Support Vector Regression was found to outperform Convolutional Neural Network. Furthermore, Genetic Algorithm was integrated to perform inverse optimization of alloy composition and processing techniques. This approach successfully identified combinations of alloy composition and processing parameters with high potential for enhanced creep performance. This study innovatively introduces a comprehensive model evaluation framework, emphasizing physical connotations and generalization abilities, in addition to traditional metrics of prediction accuracy, to realize rational guidance for alloy design.