A graph convolution network with subgraph embedding for mutagenic prediction in aromatic hydrocarbons

被引:8
作者
Moon, Hyung-Jun [1 ]
Bu, Seok-Jun [2 ]
Cho, Sung-Bae [2 ]
机构
[1] Yonsei Univ, Dept Artificial Intelligence, Seoul 03722, South Korea
[2] Yonsei Univ, Dept Comp Sci, Seoul 03722, South Korea
关键词
Mutagenic prediction; Deep learning; Graph convolution network; Graph partitioning algorithm; CLASSIFICATION; MODELS; AMINES; DNA;
D O I
10.1016/j.neucom.2023.01.091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An aromatic hydrocarbon refers to an organic material having a carbon ring such as benzene and a func-tional group in the carbon ring. As the industry develops, natural pollution becomes harsh, new com-pounds emerge, and the exposure to aromatic hydrocarbons is continuously increasing. Predicting mutagenicity is one of the crucial issues in reducing the risk because these organisms may have proper-ties that penetrate the DNA of living things to cause mutations. Recently, the accuracy of mutation pre-diction has improved due to the power of deep learning. However, most conventional methods do not consider the characteristics of molecular aromatic hydrocarbons, which dilutes local information and results in a severe deterioration of the prediction performance. In this paper, we propose a method of exploiting subgraph convolution neural networks that enables the extraction of local information of a graph by partitioning it to maintain the detailed information. For extracting the features of molecules, we use the Girvan Newman algorithm to partition the graph according to the carbon ring and functional group and obtain the embedding vectors of the subgraphs as well as the original graph with graph con-volution network (GCN). The embedding vectors are combined to represent the whole graph information and predict mutagenicity. Experiments with MUTAG, NCI1 and NCI109, datasets for predicting muta -genicity of molecules in graph structure, confirm that we successfully segment carbon rings and func-tional groups from molecular graphs and predict mutations using the partitioned graphs, leading to a 2 %p performance improvement. In addition, the proposed method has prevented about 15 %p of infor-mation dilution in GCN, and an analysis of the latent space of graphs reveals that the subgraphs extracted maintain the local information appropriately.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:60 / 68
页数:9
相关论文
共 45 条
[41]   Analyzing Learned Molecular Representations for Property Prediction [J].
Yang, Kevin ;
Swanson, Kyle ;
Jin, Wengong ;
Coley, Connor ;
Eiden, Philipp ;
Gao, Hua ;
Guzman-Perez, Angel ;
Hopper, Timothy ;
Kelley, Brian ;
Mathea, Miriam ;
Palmer, Andrew ;
Settels, Volker ;
Jaakkola, Tommi ;
Jensen, Klavs ;
Barzilay, Regina .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (08) :3370-3388
[42]  
Ying C., 2021, ADV NEURAL INFORM PR
[43]   Novel naive Bayes classification models for predicting the chemical Ames mutagenicity [J].
Zhang, Hui ;
Kang, Yan-Li ;
Zhu, Yuan-Yuan ;
Zhao, Kai-Xia ;
Liang, Jun-Yu ;
Ding, Lan ;
Zhang, Teng-Guo ;
Zhang, Ji .
TOXICOLOGY IN VITRO, 2017, 41 :56-63
[44]   CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods [J].
Zhang, Li ;
Ai, Haixin ;
Chen, Wen ;
Yin, Zimo ;
Hu, Huan ;
Zhu, Junfeng ;
Zhao, Jian ;
Zhao, Qi ;
Liu, Hongsheng .
SCIENTIFIC REPORTS, 2017, 7
[45]  
Zhao WT, 2018, Arxiv, DOI arXiv:1807.02653