iTextMine: integrated text-mining system for large-scale knowledge extraction from the literature

被引:14
|
作者
Ren, Jia [1 ]
Li, Gang [2 ]
Ross, Karen [3 ]
Arighi, Cecilia [1 ,2 ]
McGarvey, Peter [3 ,4 ]
Rao, Shruti [4 ]
Cowart, Julie [1 ]
Madhavan, Subha [4 ,5 ]
Vijay-Shanker, K. [2 ]
Wu, Cathy H. [1 ,2 ,3 ]
机构
[1] Univ Delaware, Ctr Bioinformat & Computat Biol, Newark, DE 19711 USA
[2] Univ Delaware, Dept Comp & Informat Sci, Newark, DE 19716 USA
[3] Georgetown Univ, Med Ctr, Prot Informat Resource, Washington, DC 20007 USA
[4] Georgetown Univ, Innovat Ctr Biomed Informat, Washington, DC 20007 USA
[5] Georgetown Univ, Med Ctr, Lombardi Comprehens Canc Ctr, Washington, DC 20057 USA
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2018年
基金
美国国家卫生研究院;
关键词
BINDING-PROTEIN; 1; MULTIDRUG-RESISTANCE; SATB1; PHOSPHORYLATION; INVASION;
D O I
10.1093/database/bay128
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Numerous efforts have been made for developing text-mining tools to extract information from biomedical text automatically. They have assisted in many biological tasks, such as database curation and hypothesis generation. Text-mining tools are usually different from each other in terms of programming language, system dependency and input/output format. There are few previous works that concern the integration of different text-mining tools and their results from large-scale text processing. In this paper, we describe the iTextMine system with an automated workflow to run multiple text-mining tools on large-scale text for knowledge extraction. We employ parallel processing with dockerized text-mining tools with a standardized JSON output format and implement a text alignment algorithm to solve the text discrepancy for result integration. iTextMine presently integrates four relation extraction tools, which have been used to process all the Medline abstracts and PMC open access full-length articles. The website allows users to browse the text evidence and view integrated results for knowledge discovery through a network view. We demonstrate the utilities of iTextMine with two use cases involving the gene PTEN and breast cancer and the gene SATB1.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Causal Knowledge Extraction through Large-Scale Text Mining
    Hassanzadeh, Oktie
    Bhattacharjya, Debarun
    Feblowitz, Mark
    Srinivas, Kavitha
    Perrone, Michael
    Sohrabi, Shirin
    Katz, Michael
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13610 - 13611
  • [2] BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events
    Gerner, Martin
    Sarafraz, Farzaneh
    Bergman, Casey M.
    Nenadic, Goran
    BIOINFORMATICS, 2012, 28 (16) : 2154 - 2161
  • [3] Large-Scale Extraction and Use of Knowledge from Text
    Clark, Peter
    Harrison, Phil
    K-CAP'09: PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE, 2009, : 153 - 160
  • [4] Mining Large-scale Event Knowledge from Web Text
    Cao, Ya-nan
    Zhang, Peng
    Guo, Jing
    Guo, Li
    2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2014, 29 : 478 - 487
  • [5] Large-Scale Text Mining of Biomedical Literature
    Ginter, Filip
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2013, (116): : 43 - 44
  • [6] Graph Clustering for Large-Scale Text-Mining of Brain Imaging Studies
    Chawla, Manisha
    Mesa, Mounika
    Miyapuram, Krishna P.
    PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 163 - 168
  • [7] Graph clustering for large-scale text-mining of brain imaging studies
    Center for Cognitive Science, Indian Institute of Technology, Gandhinagar, Ahmedabad, India
    不详
    不详
    ACM Int. Conf. Proc. Ser., (163-168):
  • [8] Temporal knowledge extraction from large-scale text corpus
    Yu Liu
    Wen Hua
    Xiaofang Zhou
    World Wide Web, 2021, 24 : 135 - 156
  • [9] Temporal knowledge extraction from large-scale text corpus
    Liu, Yu
    Hua, Wen
    Zhou, Xiaofang
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2021, 24 (01): : 135 - 156
  • [10] A text-mining system for knowledge discovery from Biomedical Documents
    Uramoto, N
    Matsuzawa, H
    Nagano, T
    Murakami, A
    Takeuchi, H
    Takeda, K
    IBM SYSTEMS JOURNAL, 2004, 43 (03) : 516 - 533