Interpretable machine learning boosting the discovery of targeted organometallic compounds with optimal bandgap

被引:0
|
作者
Park, Taehyun [1 ]
Song, JunHo [1 ]
Jeong, Jinyoung [1 ]
Kang, Seungpyo [1 ]
Kim, Joonchul [1 ]
Won, Joonghee [2 ]
Han, Jungim [2 ]
Min, Kyoungmin [1 ]
机构
[1] Soongsil Univ, Sch Mech Engn, 369 Sangdo Ro, Seoul 06978, South Korea
[2] Samsung Adv Inst Technol, POC TU, Suwon 16678, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Organometallic compounds; High-throughput calculations; Feature engineering; Machine learning; Active learning; ENERGY; ENVIRONMENT; PREDICTION; EFFICIENCY; SELECTION; LIMIT; GAPS;
D O I
10.1016/j.mtadv.2024.100520
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Organometallic compounds (OMCs) have attracted tremendous attention in various fields, such as photovoltaic cell and high-k dielectric application, due to their beneficial properties. Despite their potential, the progression of OMCs into industrial applications is hindered by the limited databases available for their properties and the absence of efficient surrogate models. To address this, in this study, optimally selected feature-based surrogate models for predicting the electronic properties of OMCs are constructed via various multiscale features and extensive database. To this end, high-throughput calculation was performed to obtain electronic properties of more than 18k materials generally known as organometallics, augmenting around 12k organic materials obtained from the public open data set, OMDB-GAP1. For generating features closely related to OMCs, descriptors encapsulating the information ranging local to global, also other widely-used composition-, structure-based features (more than 3.5k in total) were employed. Among these descriptors, we identified 48 critical features that elucidates the physicochemical underpinnings of OMCs' properties, suggesting their impact on the properties of OMCs. The light gradient boosting machine model achieved high-accuracy predictions across the entire database with just 1 % of the total descriptors, sufficiently compared to the entire sets (decreased of around 0.01 by R2 score and 0.01 eV by MAE). Furthermore, the efficacy of active learning process was demonstrated to find OMCs with optimal properties rapidly. As a result, expected improvement outperforms other methods by identifying 69 % of the target materials only searching 46 % of the total search space. Our constructed platform with a high-throughput calculated database can pave the way for the rapid screening of OMCs for the targeted industrial application, and suggest a comprehensive grasp of the intrinsic properties of OMCs and related compounds.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Accelerated Discovery of Ternary Gold Alloy Materials with Low Resistivity via an Interpretable Machine Learning Strategy
    Wang, Xiangdong
    Lu, Tian
    Zhou, Wenyan
    Ji, Xiaobo
    Lu, Wencong
    Yang, Jiong
    CHEMISTRY-AN ASIAN JOURNAL, 2022, 17 (22)
  • [42] Interpretable machine learning for demand modeling with high-dimensional data using Gradient Boosting Machines and Shapley values
    Antipov, Evgeny A.
    Pokryshevskaya, Elena B.
    JOURNAL OF REVENUE AND PRICING MANAGEMENT, 2020, 19 (05) : 355 - 364
  • [43] Interpretable machine learning for demand modeling with high-dimensional data using Gradient Boosting Machines and Shapley values
    Evgeny A. Antipov
    Elena B. Pokryshevskaya
    Journal of Revenue and Pricing Management, 2020, 19 : 355 - 364
  • [44] Enhancing formation bulk density prediction while drilling using mud logging data and interpretable boosting machine learning
    Boutaghane, Ayoub
    Ameur-Zaimeche, Ouafi
    Heddam, Salim
    Kechiched, Rabah
    Tahar-Belkacem, Nasreddine
    Ouladmansour, Abdelhamid
    Al-Mudhafar, Watheq J.
    Wood, David A.
    EARTH SCIENCE INFORMATICS, 2025, 18 (01)
  • [45] Machine Learning-Assisted Discovery of 2D Perovskites with Tailored Bandgap for Solar Cells
    Shen, Yushu
    Wang, Junya
    Ji, Xiaobo
    Lu, Wencong
    ADVANCED THEORY AND SIMULATIONS, 2023, 6 (06)
  • [46] Development and application of an optimal COVID-19 screening scale utilizing an interpretable machine learning algorithm
    Sedeh, Sara Sharifi
    Fatemi, Afsaneh
    Nematbakhsh, Mohammad Ali
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [47] Co-training machine learning enables interpretable discovery of near-infrared phosphors with high performance
    Xu, Wei
    Wang, Rui
    Hu, Chunhai
    Wen, Guilin
    Cui, Junqi
    Zheng, Longjiang
    Sun, Zhen
    Zhang, Yungang
    Zhang, Zhiguo
    NPJ COMPUTATIONAL MATERIALS, 2024, 10 (01)
  • [48] On-the-fly interpretable machine learning for rapid discovery of two-dimensional ferromagnets with high Curie temperature
    Lu, Shuaihua
    Zhou, Qionghua
    Guo, Yilv
    Wang, Jinlan
    CHEM, 2022, 8 (03): : 769 - +
  • [49] An interpretable machine learning-based optimization framework for the optimal design of carbon dioxide to methane process
    Bao, Runjie
    Zhang, Fu
    Rong, Dongwen
    Wang, Zhao
    Guo, Qiwen
    Yang, Qingchun
    ENERGY CONVERSION AND MANAGEMENT, 2024, 320
  • [50] Interpretable Machine Learning to Optimize Early In-Hospital Mortality Prediction for Elderly Patients with Sepsis: A Discovery Study
    Ke X.
    Zhang F.
    Huang G.
    Wang A.
    Computational and Mathematical Methods in Medicine, 2022, 2022