Browsing large-scale cheminformatics data with dimension reduction

被引:1
作者
Choi, Jong Youl [1 ]
Bae, Seung-Hee [1 ]
Qiu, Judy [1 ]
Chen, Bin
Wild, David
机构
[1] Indiana Univ, Sch Informat & Comp, Pervas Technol Inst, Bloomington, IN 47408 USA
关键词
visualization; MDS; GTM; interpolation; semantic web; BIOLOGY;
D O I
10.1002/cpe.1781
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Visualization of large-scale high dimensional data is highly valuable for data analysis facilitating scientific discovery in many fields. We present PubChemBrowse, a customized visualization tool for cheminformatics research. It provides a novel 3D data point browser that displays complex properties of massive data on commodity clients. As in Geographic Information System browsers for Earth and Environment data, chemical compounds with similar properties are nearby in the browser. PubChemBrowse is built around in-house high performance parallel Multi-dimensional scaling and Generative topographic mapping services and supports fast interaction with an external property database. These properties can be overlaid on 3D mapped compound space or queried for individual points. We prototype the integration with Chem2Bio2RDF system using SPARQL endpoint to access over 20 publicly accessible bioinformatics databases. We describe our design and implementation of the integrated PubChemBrowse application and outline its use in drug discovery. The same core technologies are generally applicable to develop high performance scientific data browsing systems for other applications. Copyright (C) 2011 John Wiley & Sons, Ltd.
引用
收藏
页码:2315 / 2325
页数:11
相关论文
共 20 条
  • [1] Solvent diversity in polymorph screening
    Alleso, Morten
    Van Den Berg, Frans
    Cornett, Claus
    Jorgensen, Flemming Steen
    Halling-Sorensen, Bent
    De Diego, Heidi Lopez
    Hovgaard, Lars
    Aaltonen, Jaakko
    Rantanen, Jukka
    [J]. JOURNAL OF PHARMACEUTICAL SCIENCES, 2008, 97 (06) : 2145 - 2159
  • [2] [Anonymous], 1978, Multidimensional scaling
  • [3] Bae S-H, 2010, HPDC 10 CHIC ILL US
  • [4] Bishop CM, 1997, ADV NEUR IN, V9, P354
  • [5] Borg I., 2005, Modern multidimensional scaling: theory and applications
  • [6] Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data
    Chen, Bin
    Dong, Xiao
    Jiao, Dazhi
    Wang, Huijun
    Zhu, Qian
    Ding, Ying
    Wild, David J.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [7] Choi JY, 2010, 10 IEEE ACM INT S CL
  • [8] De Leeuw J., 1977, RECENT DEV STAT, V1, P133
  • [9] Structure-activity landscape index: Identifying and quantifying activity cliffs
    Guha, Rajarshi
    Van Drie, John H.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2008, 48 (03) : 646 - 658
  • [10] The Ensembl genome database project
    Hubbard, T
    Barker, D
    Birney, E
    Cameron, G
    Chen, Y
    Clark, L
    Cox, T
    Cuff, J
    Curwen, V
    Down, T
    Durbin, R
    Eyras, E
    Gilbert, J
    Hammond, M
    Huminiecki, L
    Kasprzyk, A
    Lehvaslaiho, H
    Lijnzaad, P
    Melsopp, C
    Mongin, E
    Pettett, R
    Pocock, M
    Potter, S
    Rust, A
    Schmidt, E
    Searle, S
    Slater, G
    Smith, J
    Spooner, W
    Stabenau, A
    Stalker, J
    Stupka, E
    Ureta-Vidal, A
    Vastrik, I
    Clamp, M
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 38 - 41