Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature

被引:25
作者
Coelho, Luis Pedro [1 ,2 ,3 ]
Ahmed, Amr [4 ,5 ]
Arnold, Andrew [4 ]
Kangas, Joshua [1 ,2 ,3 ]
Sheikh, Abdul-Saboor [3 ]
Xing, Eric P. [1 ,2 ,3 ,4 ,5 ,6 ]
Cohen, William W. [1 ,2 ,3 ,4 ]
Murphy, Robert F. [1 ,2 ,3 ,4 ,6 ,7 ]
机构
[1] Carnegie Mellon Univ, Lane Ctr Computat Biol, Pittsburgh, PA 15213 USA
[2] Univ Pittsburgh Ph D Program Computat Biol, Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[4] Machine Learning Dept, Pittsburgh, PA 15213 USA
[5] Language Technol Inst, Pittsburgh, PA 15213 USA
[6] Dept Biol Sci, Pittsburgh, PA 15213 USA
[7] Dept Biomed Engn, Pittsburgh, PA 15213 USA
来源
LINKING LITERATURE, INFORMATION, AND KNOWLEDGE FOR BIOLOGY | 2010年 / 6004卷
关键词
D O I
10.1007/978-3-642-13131-8_4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
SLIF uses a combination of text-mining and image processing to extract information from figures in the biomedical literature. It also uses innovative extensions to traditional latent topic modeling to provide new ways to traverse the literature. SLIF provides a publicly available searchable database (http://slif.cbi.cmu.edu). SLIF originally focused on fluorescence microscopy images. We have now extended it to classify panels into more image types. We also improved the classification into subcellular classes by building a more representative training set. To get the most out of the human labeling effort, we used active learning to select images to label. We developed models that take into account the structure of the document (with panels inside figures inside papers) and the multi-modality of the information (free and annotated text, images, information from external databases). This has allowed us to provide new ways to navigate a large collection of documents.
引用
收藏
页码:23 / +
页数:3
相关论文
共 19 条
  • [1] AHMED A, 2009, J WEB SEMAN IN PRESS
  • [2] Ahmed A, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P39
  • [3] [Anonymous], 1998, SIGIR 98 P 21 ANN IN, DOI DOI 10.1145/290941.291008
  • [4] [Anonymous], 1997, MACHINE LEARNING, MCGRAW-HILL SCIENCE/ENGINEERING/MATH
  • [5] A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells
    Boland, MV
    Murphy, RF
    [J]. BIOINFORMATICS, 2001, 17 (12) : 1213 - 1223
  • [6] COHEN WW, 2003, KDD 03, P499
  • [7] Geusebroek JM, 2002, INT C PATT RECOG, P271, DOI 10.1109/ICPR.2002.1044683
  • [8] Sphingosine-1-phosphate induces the association of membrane-type 1 matrix metalloproteinase with p130Cas in endothelial cells
    Gingras, Denis
    Michaud, Marisol
    Di Tomasso, Genevieve
    Beliveau, Eric
    Nyalendo, Carine
    Beliveau, Richard
    [J]. FEBS LETTERS, 2008, 582 (03): : 399 - 404
  • [9] Fast automated cell phenotype image classification
    Hamilton, Nicholas A.
    Pantelic, Radosav S.
    Hanson, Kelly
    Teasdale, Rohan D.
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [10] STATISTICAL AND STRUCTURAL APPROACHES TO TEXTURE
    HARALICK, RM
    [J]. PROCEEDINGS OF THE IEEE, 1979, 67 (05) : 786 - 804