Information extraction from semi-structured data in the protein data bank by induction of a data description pattern

被引:0
|
作者
Kawaguchi, Y [1 ]
Kaneta, Y [1 ]
Ohkawa, T [1 ]
Nakamura, H [1 ]
Ito, N [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Osaka, Japan
来源
METMBS'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES | 2003年
关键词
Protein Data Bank; XML; information extraction; description pattern; induction;
D O I
暂无
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
PDB (Protein Data Bank) is a primary database that stores the three-dimensional data of a protein structure. This paper proposes a system, the PDB REMARK transcoder, that semi-automatically extracts significant data from REMARK lines, a part of the PDB data, and transcodes them to XML (eXtensible Markup Language) format. This system induces a description pattern from some protein entries to accept gradual variations of REMARK lines. Tokens (words and phrases) are clustered by evaluating their similarity using token attributes, and their contents are recognized by cluster labels. By using finite state automatons, description patterns are induced, and then iterative structures are correspondly nested into XML formats. The confidence of the output XML data is confirmed by log files. Applying the system to the REMARK lines of 8,906 protein entries clarified the effectiveness of the method.
引用
收藏
页码:94 / 99
页数:6
相关论文
共 50 条
  • [11] Privacy Preservation of Semi-structured Data Based on XML
    Shi, Cheng
    Yang, Mingda
    Ning, Bo
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 1081 - 1088
  • [12] Representation of map objects with semi-structured data models
    Stefanakis, E
    ADVANCES IN SPATIAL DATA HANDLING, 2002, : 547 - 562
  • [13] XSnippets: Exploring semi-structured data via snippets
    Naseriparsa, Mehdi
    Islam, Md. Saiful
    Liu, Chengfei
    Chen, Lu
    DATA & KNOWLEDGE ENGINEERING, 2019, 124
  • [14] A view-based approach to the integration of structured and semi-structured data
    Ahmad, Honda
    Kermanshahani, Shokooh
    Simonet, Ana
    Simonet, Michel
    DATABASES AND INFORMATION SYSTEMS: COMMUNICATIONS, MATERIALS OF DOCTORAL CONSORTIUM, 2006, : 41 - 51
  • [15] Generating A Semi-structured Data Frame by First Order Logic
    Ping, Liang
    2009 INTERNATIONAL FORUM ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 2, PROCEEDINGS, 2009, : 26 - 29
  • [16] Context-Aware Duplicate Detection in Semi-structured Data Streams
    Shukla, Parijat
    Somani, Arun K.
    2014 IEEE WORLD CONGRESS ON SERVICES (SERVICES), 2014, : 216 - 223
  • [17] Layout-aware information extraction from semi-structured medical images
    Luo, Kangqi
    Lu, Jinyi
    Zhu, Kenny Q.
    Gao, Weiguo
    Wei, Jia
    Zhang, Meizhuo
    COMPUTERS IN BIOLOGY AND MEDICINE, 2019, 107 : 235 - 247
  • [18] Learning Information Extraction Rules for Semi-Structured and Free Text
    Stephen Soderland
    Machine Learning, 1999, 34 : 233 - 272
  • [19] Information Extraction of Strategic Activities based on Semi-structured Text
    Ma, Xubu
    Guo, Ju-E
    Ma, Xubu
    2014 SEVENTH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION (CSO), 2014, : 579 - 583
  • [20] Learning information extraction rules for semi-structured and free text
    Soderland, S
    MACHINE LEARNING, 1999, 34 (1-3) : 233 - 272