GAEM: Genetic Algorithm based Expectation-Maximization for inferring Gene Regulatory Networks from incomplete data

被引:1
作者
Niloofar, Parisa [1 ]
Aghdam, Rosa [2 ,4 ]
Eslahchi, Changiz [3 ,4 ]
机构
[1] Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, Campusvej 55, Odense
[2] Wisconsin Institute for Discovery, University of Wisconsin-Madison, WI, Madison
[3] Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University
[4] School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM)
基金
美国国家科学基金会;
关键词
Bayesian network; Conditional Mutual Information; Expectation-Maximization; Gene Regulatory Network; Genetic algorithm; Missing values;
D O I
10.1016/j.compbiomed.2024.109238
中图分类号
学科分类号
摘要
In Bioinformatics, inferring the structure of a Gene Regulatory Network (GRN) from incomplete gene expression data is a difficult task. One popular method for inferring the structure GRNs is to apply the Path Consistency Algorithm based on Conditional Mutual Information (PCA-CMI). Although PCA-CMI excels at extracting GRN skeletons, it struggles with missing values in datasets. As a result, applying PCA-CMI to infer GRNs, necessitates a preprocessing method for data imputation. In this paper, we present the GAEM algorithm, which uses an iterative approach based on a combination of Genetic Algorithm and Expectation-Maximization to infer the structure of GRN from incomplete gene expression datasets. GAEM learns the GRN structure from the incomplete dataset via an algorithm that iteratively updates the imputed values based on the learnt GRN until the convergence criteria are met. We evaluate the performance of this algorithm under various missingness mechanisms (ignorable and nonignorable) and percentages (5%, 15%, and 40%). The traditional approach to handling missing values in gene expression datasets involves estimating them first and then constructing the GRN. However, our methodology differs in that both missing values and the GRN are updated iteratively until convergence. Results from the DREAM3 dataset demonstrate that the GAEM algorithm appears to be a more reliable method overall, especially for smaller network sizes, GAEM outperforms methods where the incomplete dataset is imputed first, followed by learning the GRN structure from the imputed data. We have implemented the GAEM algorithm within the GAEM R package, which is accessible at the following GitHub repository: https://github.com/parniSDU/GAEM. © 2024
引用
收藏
相关论文
共 75 条
[51]  
Acid S., Campos L.M., A hybrid methodology for learning belief networks: BENEDICT, Internat. J. Approx. Reason., 27, 3, pp. 235-262, (2001)
[52]  
Chickering D.M., Geiger D., Heckerman D., Learning Bayesian Networks: Search Methods and Experimental Results, pp. 112-128, (1995)
[53]  
Kalisch M., Machler M., Colombo D., Maathuis M.H., Buhlmann P., Causal inference using graphical models with the R package pcalg, J. Stat. Softw., 47, 11, pp. 1-26, (2012)
[54]  
Maathuis M.H., Kalisch M., Buhlmann P., Estimating high-dimensional intervention effects from observational data, Ann. Statist., 37, 6A, pp. 3133-3164, (2009)
[55]  
Tsamardinos I., Brown L.E., Aliferis C.F., The max-min hill-climbing Bayesian network structure learning algorithm., Mach. Learn., 65, 1, pp. 31-78, (2007)
[56]  
Lauritzen S.L., The EM algorithm for graphical association models with missing data, Comput. Statist. Data Anal., 19, 2, pp. 191-201, (1995)
[57]  
Niloofar P., Ganjali M., Farid Rohani M., Performance evaluation of imputation based on Bayesian networks, Sankhya B, 75, 1, pp. 90-111, (2013)
[58]  
Niloofar P., Ganjali M., Rohani M.F., Improving the performance of Bayesian networks in non-ignorable missing data imputation, Kuwait J. Sci., 40, 2, (2013)
[59]  
Scutari M., Learning Bayesian networks with the bnlearn R package, J. Stat. Softw., 35, i03, (2010)
[60]  
Niloofar P., Ganjali M., A new multivariate imputation method based on Bayesian networks, J. Appl. Stat., 41, 3, pp. 501-518, (2014)