This study aims to enhance the accuracy of flank wear prediction, which is essential for extending tool life and improving machining efficiency, especially in nickel-based high-temperature alloy milling where wear behavior is complex and processing conditions vary. Features strongly correlated to tool wear with minimal sensitivity to process parameters are selected based on comprehensive evaluation indicators. A multi-head self-attention one-dimensional convolutional long short-term memory (MCL) model has been developed for predicting tool wear. To enhance generalization across various machining conditions, a meta-learning approach is employed. The prediction framework integrates data-driven statistical features with physically derived milling coefficients and tool life features to enhance model accuracy. The proposed method has been validated using both the NASA dataset and a self-built dataset. Experimental results show that the MCL model achieves an R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R<^>2$$\end{document} score of 0.9369, outperforming the multi-head self-attention bidirectional long short-term memory (MB) model at 0.9057 and significantly exceeding the XGBoost model, which scored 0.5241. Incorporating features based on physical models enhances predictive accuracy, as removing the tool life (Xlife\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{life}$$\end{document}), milling coefficient (Xcut\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{cut}$$\end{document}), and traditional statistical (Xtrad\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{trad}$$\end{document}) features results in a reduction of the R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R<^>2$$\end{document} score by 43.70%, 18.10%, and 29.57%, respectively. Additionally, applying meta-learning further improves performance, increasing the R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R<^>2$$\end{document} score from 0.9369 to 0.9565, which represents a 2.09% improvement. The results demonstrate that integrating physical and statistical features enhances the accuracy and robustness of tool wear prediction. The proposed MCL model outperforms conventional approaches, with meta-learning further improving performance.