Data fusion by joint non-negative matrix factorization for hypothesizing pseudo-chemistry using Bayesian networks

被引:11
作者
Puliyanda, Anjana [1 ]
Sivaramakrishnan, Kaushik [1 ]
Li, Zukui [1 ]
de Klerk, Arno [1 ]
Prasad, Vinay [1 ]
机构
[1] Dept Chem & Mat Engn, 9211 116 St NW, Edmonton, AB T6G 1H9, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
CURVE RESOLUTION; NUMBER; COMPONENTS; BITUMEN; ALGORITHMS; EXPRESSION; CONVERSION; PREDICT; MODEL; RAMAN;
D O I
10.1039/d0re00147c
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Inferring the reaction pathways underlying the processing of complex feeds, using noisy data from spectral sensors that may contain information regarding molecular mechanisms, is challenging. This is tackled by a two-step approach for the partial upgrading of Cold Lake bitumen: first, joint non-negative matrix factorization (JNMF) is used as a data fusion algorithm to extract pseudocomponent spectra by combining complementary information about the reacting environment from Fourier transform infrared (FTIR) and proton nuclear magnetic resonance (H-1-NMR) spectroscopic sensors. Second, a probabilistic inferential model that hypothesizes reaction mechanisms among the identified pseudocomponent spectra is constructed using Bayesian networks that encode directed acyclic causal pathways among the nodes of the random variables (pseudocomponent spectra). The JNMF algorithm has been developed to handle process data artefacts by imputing missing data, using a rotationally invariant norm for robustness to outliers and noise, and enforcing the non-negativity constraint to ensure physical interpretability in compliance with Beer's law for spectral data. The projected optimal gradient approach developed to solve the JNMF objective converges within fewer iterations at the specified tolerance as compared to the multiplicative update rules (MUR). Solution ambiguity in JNMF is limited by incorporating graph regularization terms: (a) inter-sensor co-regularization that penalizes redundancy in the pseudocomponent spectra across spectral sensors, and (b) intra-spectral manifold regularization that penalizes overfitting of the pseudocomponent spectra from each sensor by penalizing redundant peaks within a spectrum. Weighting the intra-spectral regularization term that minimizes similarly correlated peaks across spectral channels of a sensor to zero is seen to result in chemically meaningful pseudocomponent spectra, given that different organic compounds share similar properties with respect to their hydrocarbon structure. Hence, the preferential weighting of regularizers is shown to act as a chemical information sieve by controlling the peaks that appear in the pseudocomponent spectra, thereby enabling the proposal of different reaction mechanisms, based on the similarity metric used to model the graph structure.
引用
收藏
页码:1719 / 1737
页数:19
相关论文
共 67 条
[11]  
[Anonymous], 2016, J CHEMOMETR, DOI DOI 10.1002/CEM.2808
[12]  
[Anonymous], 2017, IND ENG CHEM RES, DOI DOI 10.1021/ACS.IECR.7B01849
[13]  
[Anonymous], 2014, INT C PATT RECOG, DOI DOI 10.1109/ICPR.2014.610
[14]  
[Anonymous], 2019, IEEE T CYBERNETICS, DOI DOI 10.1109/TCYB.2018.2842052
[15]  
[Anonymous], 2017, J CHEMOMETR, DOI DOI 10.1002/CEM.2900
[16]  
Berzan C., 2012, THESIS
[17]   SVD based initialization: A head start for nonnegative matrix factorization [J].
Boutsidis, C. ;
Gallopoulos, E. .
PATTERN RECOGNITION, 2008, 41 (04) :1350-1362
[18]   Graph Regularized Nonnegative Matrix Factorization for Data Representation [J].
Cai, Deng ;
He, Xiaofei ;
Han, Jiawei ;
Huang, Thomas S. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (08) :1548-1560
[19]  
Chu M., 2004, SIAM J MATRIX ANAL, V4, P8030
[20]   Multivariate Curve Resolution (MCR). Solving the mixture analysis problem [J].
de Juan, Anna ;
Jaumot, Joaquim ;
Tauler, Rom A. .
ANALYTICAL METHODS, 2014, 6 (14) :4964-4976