A machine learning framework for extracting information from biological pathway images in the literature

被引:0
作者
Kwon, Mun Su [1 ]
Lee, Junkyu [1 ]
Kim, Hyun Uk [1 ,2 ,3 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Dept Chem & Biomol Engn, Daejeon 34141, South Korea
[2] Korea Adv Inst Sci & Technol, Grad Sch Engn Biol, Daejeon 34141, South Korea
[3] Korea Adv Inst Sci & Technol, BioProc Engn Res Ctr, Daejeon 34141, South Korea
基金
新加坡国家研究基金会;
关键词
Literature mining; Information extraction; Metabolic engineering; Biological pathway images; Object detection; RECOGNITION; ACID;
D O I
10.1016/j.ymben.2024.09.001
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
There have been significant advances in literature mining, allowing for the extraction of target information from the literature. However, biological literature often includes biological pathway images that are difficult to extract in an easily editable format. To address this challenge, this study aims to develop a machine learning framework called the "Extraction of Biological Pathway Information" (EBPI). The framework automates the search for relevant publications, extracts biological pathway information from images within the literature, including genes, enzymes, and metabolites, and generates the output in a tabular format. For this, this framework determines the direction of biochemical reactions, and detects and classifies texts within biological pathway images. Performance of EBPI was evaluated by comparing the extracted pathway information with manually curated pathway maps. EBPI will be useful for extracting biological pathway information from the literature in a high-throughput manner, and can be used for pathway studies, including metabolic engineering.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 44 条
[1]   Hydroxytyrosol and Its Potential Uses on Intestinal and Gastrointestinal Disease [J].
Arangia, Alessia ;
Marino, Ylenia ;
Impellizzeri, Daniela ;
D'Amico, Ramona ;
Cuzzocrea, Salvatore ;
Di Paola, Rosanna .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2023, 24 (04)
[2]   Recognition System for On-line Sketched Diagrams [J].
Bresler, Martin ;
Truyen Van Phan ;
Prusa, Daniel ;
Nakagawa, Masaki ;
Hlavac, Vaclav .
2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, :563-568
[3]   Online recognition of sketched arrow-connected diagrams [J].
Bresler, Martin ;
Prusa, Daniel ;
Hlavac, Vaclav .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2016, 19 (03) :253-267
[4]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[5]   The MetaCyc database of metabolic pathways and enzymes - a 2019 update [J].
Caspi, Ron ;
Billington, Richard ;
Keseler, Ingrid M. ;
Kothari, Anamika ;
Krummenacker, Markus ;
Midford, Peter E. ;
Ong, Wai Kit ;
Paley, Suzanne ;
Subhraveti, Pallavi ;
Karp, Peter D. .
NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) :D445-D453
[6]   Coproduction of 5-Aminovalerate and d-Valerolactam for the Synthesis of Nylon 5 From L-Lysine in Escherichia coli [J].
Cheng, Jie ;
Tu, Wenying ;
Luo, Zhou ;
Liang, Li ;
Gou, Xinghua ;
Wang, Xinhui ;
Liu, Chao ;
Zhang, Guoqiang .
FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2021, 9
[7]   Reactome pathway analysis: a high-performance in-memory approach [J].
Fabregat, Antonio ;
Sidiropoulos, Konstantinos ;
Viteri, Guilherme ;
Forner, Oscar ;
Marin-Garcia, Pablo ;
Arnau, Vicente ;
D'Eustachio, Peter ;
Stein, Lincoln ;
Hermjakob, Henning .
BMC BIOINFORMATICS, 2017, 18
[8]   Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery [J].
Gonzalez, Graciela H. ;
Tahsin, Tasnia ;
Goodale, Britton C. ;
Greene, Anna C. ;
Greene, Casey S. .
BRIEFINGS IN BIOINFORMATICS, 2016, 17 (01) :33-42
[9]   Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [J].
Gu, Yu ;
Tinn, Robert ;
Cheng, Hao ;
Lucas, Michael ;
Usuyama, Naoto ;
Liu, Xiaodong ;
Naumann, Tristan ;
Gao, Jianfeng ;
Poon, Hoifung .
ACM TRANSACTIONS ON COMPUTING FOR HEALTHCARE, 2022, 3 (01)
[10]   ChEBI in 2016: Improved services and an expanding collection of metabolites [J].
Hastings, Janna ;
Owen, Gareth ;
Dekker, Adriano ;
Ennis, Marcus ;
Kale, Namrata ;
Muthukrishnan, Venkatesh ;
Turner, Steve ;
Swainston, Neil ;
Mendes, Pedro ;
Steinbeck, Christoph .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D1214-D1219