Interpretable machine learning boosting the discovery of targeted organometallic compounds with optimal bandgap

被引:0
|
作者
Park, Taehyun [1 ]
Song, JunHo [1 ]
Jeong, Jinyoung [1 ]
Kang, Seungpyo [1 ]
Kim, Joonchul [1 ]
Won, Joonghee [2 ]
Han, Jungim [2 ]
Min, Kyoungmin [1 ]
机构
[1] Soongsil Univ, Sch Mech Engn, 369 Sangdo Ro, Seoul 06978, South Korea
[2] Samsung Adv Inst Technol, POC TU, Suwon 16678, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Organometallic compounds; High-throughput calculations; Feature engineering; Machine learning; Active learning; ENERGY; ENVIRONMENT; PREDICTION; EFFICIENCY; SELECTION; LIMIT; GAPS;
D O I
10.1016/j.mtadv.2024.100520
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Organometallic compounds (OMCs) have attracted tremendous attention in various fields, such as photovoltaic cell and high-k dielectric application, due to their beneficial properties. Despite their potential, the progression of OMCs into industrial applications is hindered by the limited databases available for their properties and the absence of efficient surrogate models. To address this, in this study, optimally selected feature-based surrogate models for predicting the electronic properties of OMCs are constructed via various multiscale features and extensive database. To this end, high-throughput calculation was performed to obtain electronic properties of more than 18k materials generally known as organometallics, augmenting around 12k organic materials obtained from the public open data set, OMDB-GAP1. For generating features closely related to OMCs, descriptors encapsulating the information ranging local to global, also other widely-used composition-, structure-based features (more than 3.5k in total) were employed. Among these descriptors, we identified 48 critical features that elucidates the physicochemical underpinnings of OMCs' properties, suggesting their impact on the properties of OMCs. The light gradient boosting machine model achieved high-accuracy predictions across the entire database with just 1 % of the total descriptors, sufficiently compared to the entire sets (decreased of around 0.01 by R2 score and 0.01 eV by MAE). Furthermore, the efficacy of active learning process was demonstrated to find OMCs with optimal properties rapidly. As a result, expected improvement outperforms other methods by identifying 69 % of the target materials only searching 46 % of the total search space. Our constructed platform with a high-throughput calculated database can pave the way for the rapid screening of OMCs for the targeted industrial application, and suggest a comprehensive grasp of the intrinsic properties of OMCs and related compounds.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Interpretable Machine Learning with Boosting by Boolean Algorithm
    Neuhaus, Nathan
    Kovalerchuk, Boris
    2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), 2019, : 307 - 311
  • [2] Interpretable discovery of semiconductors with machine learning
    Hitarth Choubisa
    Petar Todorović
    Joao M. Pina
    Darshan H. Parmar
    Ziliang Li
    Oleksandr Voznyy
    Isaac Tamblyn
    Edward H. Sargent
    npj Computational Materials, 9
  • [3] Interpretable discovery of semiconductors with machine learning
    Choubisa, Hitarth
    Todorovic, Petar
    Pina, Joao M. M.
    Parmar, Darshan H.
    Li, Ziliang
    Voznyy, Oleksandr
    Tamblyn, Isaac
    Sargent, Edward H.
    NPJ COMPUTATIONAL MATERIALS, 2023, 9 (01)
  • [4] Accelerating the Discovery of Hybrid Perovskites with Targeted Band Gaps via Interpretable Machine Learning
    Yang, Chao
    Chong, Xiaoyu
    Hu, Mingyu
    Yu, Wei
    He, Jingjin
    Zhang, Yalan
    Feng, Jing
    Zhou, Yuanyuan
    Wang, Lin-Wang
    ACS APPLIED MATERIALS & INTERFACES, 2023, 15 (34) : 40419 - 40427
  • [5] Interpretable machine learning with an ensemble of gradient boosting machines
    Konstantinov, Andrei, V
    Utkin, Lev, V
    KNOWLEDGE-BASED SYSTEMS, 2021, 222
  • [6] Interpretable Machine Learning for Discovery: Statistical Challenges and Opportunities
    Allen, Genevera I.
    Gan, Luqin
    Zheng, Lili
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, 2024, 11 : 97 - 121
  • [7] Toward Interpretable Machine Learning Models for Materials Discovery
    Mikulskis, Paulius
    Alexander, Morgan R.
    Winkler, David Alan
    ADVANCED INTELLIGENT SYSTEMS, 2019, 1 (08)
  • [8] Interpretable machine learning as a tool for scientific discovery in chemistry
    Dybowski, Richard
    NEW JOURNAL OF CHEMISTRY, 2020, 44 (48) : 20914 - 20920
  • [9] Boosting the Accuracy of Commercial Real Estate Appraisals: An Interpretable Machine Learning Approach
    Deppner, Juergen
    von Ahlefeldt-Dehn, Benedict
    Beracha, Eli
    Schaefers, Wolfgang
    JOURNAL OF REAL ESTATE FINANCE AND ECONOMICS, 2023,
  • [10] Accelerating the discovery of direct bandgap doped-spinel photovoltaic materials: A target-driven approach using interpretable machine learning
    Liu, Chaofan
    Chen, Zhengxin
    Ding, Chunliang
    Jin, Shengde
    Wang, Jiafan
    Feng, Jiawei
    Wu, Jiang
    Huang, Heping
    Lin, Jia
    Yu, Jingfei
    Quan, Yuyue
    Zhang, Kaiyuan
    SOLAR ENERGY MATERIALS AND SOLAR CELLS, 2024, 271