Information extraction from semi-structured data in the protein data bank by induction of a data description pattern

被引:0
|
作者
Kawaguchi, Y [1 ]
Kaneta, Y [1 ]
Ohkawa, T [1 ]
Nakamura, H [1 ]
Ito, N [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Osaka, Japan
来源
METMBS'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES | 2003年
关键词
Protein Data Bank; XML; information extraction; description pattern; induction;
D O I
暂无
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
PDB (Protein Data Bank) is a primary database that stores the three-dimensional data of a protein structure. This paper proposes a system, the PDB REMARK transcoder, that semi-automatically extracts significant data from REMARK lines, a part of the PDB data, and transcodes them to XML (eXtensible Markup Language) format. This system induces a description pattern from some protein entries to accept gradual variations of REMARK lines. Tokens (words and phrases) are clustered by evaluating their similarity using token attributes, and their contents are recognized by cluster labels. By using finite state automatons, description patterns are induced, and then iterative structures are correspondly nested into XML formats. The confidence of the output XML data is confirmed by log files. Applying the system to the REMARK lines of 8,906 protein entries clarified the effectiveness of the method.
引用
收藏
页码:94 / 99
页数:6
相关论文
共 50 条
  • [21] A survey on semi-structured web data manipulations by non-expert users
    Tekli, Gilbert
    COMPUTER SCIENCE REVIEW, 2021, 40
  • [22] Improved parallel algorithms for path expression query processing of semi-structured data
    Sun, WJ
    Lü, KJ
    Wong, KF
    COOPERATIVE INTERNET COMPUTING, 2003, 729 : 145 - 164
  • [23] Extracting information from semi-structured Internet sources
    Jeong, JS
    Oh, DI
    ISIE 2001: IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS PROCEEDINGS, VOLS I-III, 2001, : 1378 - 1381
  • [24] Research on Semi-Structured and Unstructured Data Storage and Management Model for Multi-Tenant
    Hu, Xin
    Xu, Yabin
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2019, 12 (01) : 49 - 62
  • [25] Research on new product structure model based on semi-structured data for virtual enterprise
    Li, XY
    Dong, Z
    Guo, AD
    PROCEEDINGS OF 2003 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2003, : 838 - 842
  • [26] Consideration of the Word's Neighborhood in GATs for Information Extraction in Semi-structured Documents
    Belhadj, Djedjiga
    Belaid, Yolande
    Belaid, Abdel
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 854 - 869
  • [27] Abstracting Knowledge from the Protein Data Bank
    Furnham, Nicholas
    Laskowski, Roman A.
    Thornton, Janet M.
    BIOPOLYMERS, 2013, 99 (03) : 183 - 188
  • [28] Pre-processing task for integration of semi-structured data in decision-support system
    Duffoux, A.
    Duval, B.
    Loiseau, S.
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 2007, : 598 - +
  • [29] The Protein Data Bank archive as an open data resource
    Helen M. Berman
    Gerard J. Kleywegt
    Haruki Nakamura
    John L. Markley
    Journal of Computer-Aided Molecular Design, 2014, 28 : 1009 - 1014
  • [30] The Protein Data Bank archive as an open data resource
    Berman, Helen M.
    Kleywegt, Gerard J.
    Nakamura, Haruki
    Markley, John L.
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2014, 28 (10) : 1009 - 1014