Classification of metabolites by metabolic pathways concerning terpenoids, phenylpropanoids, and polyketide compounds based on machine learning

被引:0
作者
Koide, Yuri [1 ]
Koge, Daiki [1 ]
Kanaya, Shigehiko [1 ]
Altaf-Ul-Amin, Md. [1 ]
Huang, Ming [1 ]
Morita, Aki Hirai [1 ]
Ono, Naoaki [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Sci & Technol, Div Informat Sci, Computat Syst Biol Lab, Takayama Cho 8916-5, Ikoma Shi, Nara 6300119, Japan
关键词
random forest; terpenoids; phenylpropanoids; polyketides; machine learning; KNApSAcK Core DB; BIOSYNTHESIS;
D O I
暂无
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Terpenoids, phenylpropanoids, and polyketides are the majority of the secondary metabolites containing carbon, hydrogen, and oxygen. In this work, 19,769 metabolites accumulated in KNApSAcK Core DB were classified into 71 subgroups comprising three major groups (terpenoids, phenylpropanoids, and polyketides) according to scientific literatures. We represented the metabolites as molecular fingerprint including chemical properties, and used those descriptors for classification by random forest model. We found that both training and test metabolites were well classified into the subgroups, with 94.06 %, and 94.23 % accuracy, respectively. Though classification of metabolites based on metabolic pathways is very time-consuming works, machine learnings with molecular fingerprint made it possible to attain the classification. This work will lead a light for systematical and evolutional understanding of diverged secondary metabolites based on secondary metabolic pathways. Data science is an interdisciplinary and applied field that uses techniques and theories drawn from statistics, mathematics, computer science, and information science. Combining these resources data science enables extracting meaningful and practical insights for secondary metabolites.
引用
收藏
页数:10
相关论文
共 23 条
[1]   KNApSAcK Family Databases: Integrated Metabolite-Plant Species Databases for Multifaceted Plant Research [J].
Afendi, Farit Mochamad ;
Okada, Taketo ;
Yamazaki, Mami ;
Hirai-Morita, Aki ;
Nakamura, Yukiko ;
Nakamura, Kensuke ;
Ikeda, Shun ;
Takahashi, Hiroki ;
Altaf-Ul-Amin, Md. ;
Darusman, Latifah K. ;
Saito, Kazuki ;
Kanaya, Shigehiko .
PLANT AND CELL PHYSIOLOGY, 2012, 53 (02) :e1
[2]   Terpenes and Terpenoids in Plants: Interactions with Environment and Insects [J].
Boncan, Delbert Almerick T. ;
Tsang, Stacey S. K. ;
Li, Chade ;
Lee, Ivy H. T. ;
Lam, Hon-Ming ;
Chan, Ting-Fung ;
Hui, Jerome H. L. .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2020, 21 (19) :1-19
[3]  
Clifford MN, 2000, J SCI FOOD AGR, V80, P1118, DOI [10.1002/(SICI)1097-0010(20000515)80:7<1118::AID-JSFA570>3.0.CO
[4]  
2-9, 10.1002/(SICI)1097-0010(20000515)80:7<1063::AID-JSFA605>3.0.CO
[5]  
2-Q]
[6]  
Connolly J.D., 1991, DICT TERPENOIDS
[7]   Plant Secondary Metabolites as Defense Tools against Herbivores for Sustainable Crop Protection [J].
Divekar, Pratap Adinath ;
Narayana, Srinivasa ;
Divekar, Bhupendra Adinath ;
Kumar, Rajeev ;
Gadratagi, Basana Gowda ;
Ray, Aishwarya ;
Singh, Achuit Kumar ;
Rani, Vijaya ;
Singh, Vikas ;
Singh, Akhilesh Kumar ;
Kumar, Amit ;
Singh, Rudra Pratap ;
Meena, Radhe Shyam ;
Behera, Tusar Kanti .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (05)
[8]   Classification of alkaloids according to the starting substances of their biosynthetic pathways using graph convolutional neural networks [J].
Eguchi, Ryohei ;
Ono, Naoaki ;
Morita, Aki Hirai ;
Katsuragi, Tetsuo ;
Nakamura, Satoshi ;
Huang, Ming ;
Altaf-Ul-Amin, Md ;
Kanaya, Shigehiko .
BMC BIOINFORMATICS, 2019, 20 (1)
[9]  
Fraser Christopher M, 2011, Arabidopsis Book, V9, pe0152, DOI 10.1199/tab.0152
[10]   The shikimate pathway [J].
Herrmann, KM ;
Weaver, LM .
ANNUAL REVIEW OF PLANT PHYSIOLOGY AND PLANT MOLECULAR BIOLOGY, 1999, 50 :473-503