Using word embeddings in abstracts to accelerate metallocene catalysis polymerization research

被引:4
作者
Ho, David [1 ]
Shkolnik, Albert S. [1 ]
Ferraro, Neil J. [1 ]
Rizkin, Benjamin A. [1 ]
Hartman, Ryan L. [1 ]
机构
[1] NYU, Dept Chem & Biomol Engn, 6 Metrotech Ctr, Brooklyn, NY 11201 USA
基金
美国国家科学基金会;
关键词
Machine learning; Metallocene catalysis; Word embeddings; Polymerization; Natural language; KNOWLEDGE;
D O I
10.1016/j.compchemeng.2020.107026
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Natural language processing (NLP) and word embeddings trained neural networks were investigated as a more efficient method to extract useful information on catalytic polymerizations. Thousands of abstracts on metallocene-catalyzed polymerizations were accessed through journal Application Programming Interfaces. These abstracts were then used to create a group of related models to produce word embeddings, making use of the word2vec algorithm. This algorithm turns vocabulary into high dimensional vectors using unsupervised training. These vectors can then be used to show relationships between chemicals, suggest catalysts and activators combinations, understand acronyms, and categorize chemical compounds based on their reagent classification. We hypothesize that one can determine which areas of metallocene catalysis are understudied by comparing the predicted abstract and catalysts combinations with those found in existing abstracts, thereby guiding research to major breakthroughs as scientific literature continues to grow. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:8
相关论文
共 27 条
[1]  
Berger M. J., 2015, TECHNICAL REPORT
[2]   Principal component analysis [J].
Bro, Rasmus ;
Smilde, Age K. .
ANALYTICAL METHODS, 2014, 6 (09) :2812-2831
[3]   Machine learning for molecular and materials science [J].
Butler, Keith T. ;
Davies, Daniel W. ;
Cartwright, Hugh ;
Isayev, Olexandr ;
Walsh, Aron .
NATURE, 2018, 559 (7715) :547-555
[4]   Natural language processing [J].
Chowdhury, GG .
ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2003, 37 :51-89
[5]   Chemlistem: chemical named entity recognition using recurrent neural networks [J].
Corbett, Peter ;
Boyle, John .
JOURNAL OF CHEMINFORMATICS, 2018, 10
[6]  
Goldberg Y., 2014, ARXIV PREPRINT ARXIV, V1402, P3722
[7]  
Han J., 2006, Data Mining: Concepts and Techniques
[8]  
Ho D., 2014, ARITIFICIAL INTELLIG
[9]   Representing Multiword Chemical Terms through Phrase-Level Preprocessing and Word Embedding [J].
Huang, Liyuan ;
Ling, Chen .
ACS OMEGA, 2019, 4 (20) :18510-18519
[10]   Machine learning in catalysis [J].
Kitchin, John R. .
NATURE CATALYSIS, 2018, 1 (04) :230-232