Using word embeddings in abstracts to accelerate metallocene catalysis polymerization research

被引:4
作者
Ho, David [1 ]
Shkolnik, Albert S. [1 ]
Ferraro, Neil J. [1 ]
Rizkin, Benjamin A. [1 ]
Hartman, Ryan L. [1 ]
机构
[1] NYU, Dept Chem & Biomol Engn, 6 Metrotech Ctr, Brooklyn, NY 11201 USA
基金
美国国家科学基金会;
关键词
Machine learning; Metallocene catalysis; Word embeddings; Polymerization; Natural language; KNOWLEDGE;
D O I
10.1016/j.compchemeng.2020.107026
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Natural language processing (NLP) and word embeddings trained neural networks were investigated as a more efficient method to extract useful information on catalytic polymerizations. Thousands of abstracts on metallocene-catalyzed polymerizations were accessed through journal Application Programming Interfaces. These abstracts were then used to create a group of related models to produce word embeddings, making use of the word2vec algorithm. This algorithm turns vocabulary into high dimensional vectors using unsupervised training. These vectors can then be used to show relationships between chemicals, suggest catalysts and activators combinations, understand acronyms, and categorize chemical compounds based on their reagent classification. We hypothesize that one can determine which areas of metallocene catalysis are understudied by comparing the predicted abstract and catalysts combinations with those found in existing abstracts, thereby guiding research to major breakthroughs as scientific literature continues to grow. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:8
相关论文
共 27 条
[21]   Machine Learning for Catalysis Informatics: Recent Applications and Prospects [J].
Toyao, Takashi ;
Maeno, Zen ;
Takakusagi, Satoru ;
Kamachi, Takashi ;
Takigawa, Ichigaku ;
Shimizu, Ken-ichi .
ACS CATALYSIS, 2020, 10 (03) :2260-2297
[22]   Unsupervised word embeddings capture latent knowledge from materials science literature [J].
Tshitoyan, Vahe ;
Dagdelen, John ;
Weston, Leigh ;
Dunn, Alexander ;
Rong, Ziqin ;
Kononova, Olga ;
Persson, Kristin A. ;
Ceder, Gerbrand ;
Jain, Anubhav .
NATURE, 2019, 571 (7763) :95-+
[23]  
van der Maaten L, 2008, J MACH LEARN RES, V9, P2579
[24]  
Ware M., 2015, STM REPORT OVERVIEW, V6, P6
[25]   Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature [J].
Weston, L. ;
Tshitoyan, V ;
Dagdelen, J. ;
Kononova, O. ;
Trewartha, A. ;
Persson, K. A. ;
Ceder, G. ;
Jain, A. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (09) :3692-3702
[26]  
Weston L., 2019, APPL LARGE SCALE INF
[27]  
Ying Yang M, 2019, Multimodal scene understanding: Algorithms, applications and deep learning, Vfirst