Learning from mistakes: Sampling strategies to efficiently train machine learning models for material property prediction

被引:13
作者
Magar, Rishikesh [1 ]
Farimani, Amir Barati [1 ]
机构
[1] Carnegie Mellon Univ, Dept Mech Engn, Pittsburgh, PA 15213 USA
基金
美国安德鲁·梅隆基金会;
关键词
Machine learning; Sampling algorithms; DISCOVERY; NETWORKS;
D O I
10.1016/j.commatsci.2023.112167
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Recent advances in machine learning (ML) based methodologies have accelerated the prediction of the physical properties of materials. These ML models, however, rely on large amounts of simulated or experimental data to make a reliable prediction. This dependence on large amounts of data can be a roadblock to building ML models since collecting the data is prohibitively expensive and time-consuming. In this work, we propose two sampling strategies to reliably train machine learning models in the lowest amounts of data. Our algorithms alleviate the need to generate large datasets to train machine learning models. We demonstrate the effectiveness of these sampling strategies by improving the performance of Crystal Graph Convolutional Neural Network (CGCNN) on four different datasets. Using the proposed strategies, we can reach the benchmark performance of CGCNN models in fewer data samples.
引用
收藏
页数:8
相关论文
共 50 条
[21]   Cubic Graph Property Dataset for Machine Learning Models [J].
Modrovicova, Bianka ;
Dudas, Adam .
2024 IEEE 17TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATICS, INFORMATICS, 2024, :226-231
[22]   Enhanced Stroke Risk Prediction: A Fusion of Machine Learning Models for Improved Healthcare Strategies [J].
Rafeeq Ahmed ;
Anmol Varshney ;
Zubair Ashraf ;
Nafees Akhter Farooqui ;
Ravi Shanker Pathak .
SN Computer Science, 5 (8)
[23]   Predicting ACL Reconstruction Failure with Machine Learning: Development of Machine Learning Prediction Models [J].
Alaiti, Rafael Krasic ;
Vallio, Caio Sain ;
da Silva, Andre Giardino Moreira ;
Gobbi, Riccardo Gomes ;
Pecora, Jose Ricardo ;
Helito, Camilo Partezani .
ORTHOPAEDIC JOURNAL OF SPORTS MEDICINE, 2025, 13 (03)
[24]   Shoreline dynamics prediction using machine learning models: from process learning to probabilistic forecasting [J].
Adeli, Afshar ;
Dastgheib, Ali ;
Roelvink, Dano .
FRONTIERS IN MARINE SCIENCE, 2025, 12
[25]   Prediction of Preeclampsia Using Machine Learning and Deep Learning Models: A Review [J].
Aljameel, Sumayh S. ;
Alzahrani, Manar ;
Almusharraf, Reem ;
Altukhais, Majd ;
Alshaia, Sadeem ;
Sahlouli, Hanan ;
Aslam, Nida ;
Khan, Irfan Ullah ;
Alabbad, Dina A. ;
Alsumayt, Albandari .
BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (01)
[26]   Wind Power Prediction Based on Machine Learning and Deep Learning Models [J].
Tarek, Zahraa ;
Shams, Mahmoud Y. ;
Elshewey, Ahmed M. ;
El-kenawy, El-Sayed M. ;
Ibrahim, Abdelhameed ;
Abdelhamid, Abdelaziz A. ;
El-dosuky, Mohamed A. .
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (01) :715-732
[27]   A Comprehensive Evaluation of Machine Learning and Deep Learning Models for Churn Prediction [J].
Abdelaziz, Nabil M. ;
Bekheet, Mostafa ;
Salah, Ahmad ;
El-Saber, Nissreen ;
AbdelMoneim, Wafaa T. .
INFORMATION, 2025, 16 (07)
[28]   Application of Machine Learning Algorithms Based on Active Learning Strategies and Interpretable Models for HVAC System Energy Consumption Prediction [J].
Yang, Jiarui ;
Xia, Bin .
Engineering Reports, 2025, 7 (07)
[29]   Relationship of structure and mechanical property of silica with enhanced sampling and machine learning [J].
Deng, Yuanpeng ;
Du, Tao ;
Li, Hui .
JOURNAL OF THE AMERICAN CERAMIC SOCIETY, 2021, 104 (08) :3910-3920
[30]   Negative sampling strategies impact the prediction of scale-free biomolecular network interactions with machine learning [J].
Li, Pengpai ;
Shao, Bowen ;
Zhao, Guoqing ;
Liu, Zhi-Ping .
BMC BIOLOGY, 2025, 23 (01)